CopyFile

Basic Usage

copyfile.sh in=<file> out=<file>

CopyFile copies files using a 16,384-byte buffer with forced compression/decompression processing. Unlike ReadWrite's version, this implementation calls ReadWrite.getInputStream() and ReadWrite.getOutputStream() methods even when input and output files have identical extensions, eliminating extension-based optimization shortcuts for consistent benchmarking results.

Parameters

CopyFile accepts the following parameters for file copying operations:

File I/O Parameters

in=<file>: Input file to copy. This parameter is required.
out=<file>: Output destination file. This parameter is required.

Operation Parameters

overwrite=true: (ow) Overwrite existing output files. Default: true
append=false: (app) Append to existing output file instead of overwriting. Default: false

Examples

Basic File Copy

copyfile.sh in=input.txt out=output.txt

Copies input.txt to output.txt with default settings.

Compression/Decompression

copyfile.sh in=data.txt out=data.txt.gz

Compresses a text file to gzip format.

Append Mode

copyfile.sh in=new_data.txt out=existing_file.txt append=true

Appends the content of new_data.txt to existing_file.txt.

Prevent Overwrite

copyfile.sh in=source.fq out=backup.fq overwrite=false

Copies source.fq to backup.fq but fails if backup.fq already exists.

Positional Arguments

copyfile.sh input.fasta output.fasta

Files can also be specified as positional arguments without parameter names.

Algorithm Details

CopyFile implements a straightforward file copying algorithm with the following technical characteristics:

Copy Implementation

Buffer Size: Uses exactly 16,384 bytes (16KB) buffer allocated as new byte[16384]
Stream Handling: Calls ReadWrite.getInputStream(source, false, true) and ReadWrite.getOutputStream(dest, false, false, true) for dynamic format detection
Read-Write Loop: Standard buffered copying using while((len = in.read(buffer)) > 0) { out.write(buffer, 0, len); }
Force Processing: Unlike ReadWrite's version, forces compression/decompression regardless of file extensions for benchmarking purposes

Performance Characteristics

Speed Calculation: Uses Timer class with formula bytes*1000d/t.elapsed for MB/s calculation
Memory Usage: JVM allocation of 120MB (-Xmx120m -Xms120m) configured in shell script
Timing Precision: shared.Timer class provides millisecond-precision performance measurement
Error Handling: Wraps FileNotFoundException and IOException in RuntimeException

Special Features

ZIP Stream Handling: Detects ZipOutputStream.class and calls zos.closeEntry() and zos.finish()
Path Creation: Uses File.getParentFile() and parent.mkdirs() when createPathIfNeeded=true
Synchronization: copyFile() method is declared synchronized for thread safety
Format Detection: Leverages ReadWrite class format detection based on file extensions and compression signatures

Use Cases

CopyFile is primarily designed for benchmarking due to its forced processing behavior:

I/O Performance Benchmarking: Forces processing even with identical extensions, ensuring consistent compression/decompression timing
Format Conversion: Convert between .txt, .gz, .zip, .bz2 formats using ReadWrite format detection
Compression Overhead Testing: Measure the performance impact of different compression algorithms
File Recompression: Main purpose as stated in source comments - recompressing existing files
Speed Testing: Built-in MB/s reporting using file size and Timer elapsed time

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org