CopyFile
Copies a file. The main purpose is to recompress it.
Basic Usage
copyfile.sh in=<file> out=<file>
CopyFile copies files using a 16,384-byte buffer with forced compression/decompression processing. Unlike ReadWrite's version, this implementation calls ReadWrite.getInputStream() and ReadWrite.getOutputStream() methods even when input and output files have identical extensions, eliminating extension-based optimization shortcuts for consistent benchmarking results.
Parameters
CopyFile accepts the following parameters for file copying operations:
File I/O Parameters
- in=<file>
- Input file to copy. This parameter is required.
- out=<file>
- Output destination file. This parameter is required.
Operation Parameters
- overwrite=true
- (ow) Overwrite existing output files. Default: true
- append=false
- (app) Append to existing output file instead of overwriting. Default: false
Examples
Basic File Copy
copyfile.sh in=input.txt out=output.txt
Copies input.txt to output.txt with default settings.
Compression/Decompression
copyfile.sh in=data.txt out=data.txt.gz
Compresses a text file to gzip format.
Append Mode
copyfile.sh in=new_data.txt out=existing_file.txt append=true
Appends the content of new_data.txt to existing_file.txt.
Prevent Overwrite
copyfile.sh in=source.fq out=backup.fq overwrite=false
Copies source.fq to backup.fq but fails if backup.fq already exists.
Positional Arguments
copyfile.sh input.fasta output.fasta
Files can also be specified as positional arguments without parameter names.
Algorithm Details
CopyFile implements a straightforward file copying algorithm with the following technical characteristics:
Copy Implementation
- Buffer Size: Uses exactly 16,384 bytes (16KB) buffer allocated as
new byte[16384]
- Stream Handling: Calls
ReadWrite.getInputStream(source, false, true)
andReadWrite.getOutputStream(dest, false, false, true)
for dynamic format detection - Read-Write Loop: Standard buffered copying using
while((len = in.read(buffer)) > 0) { out.write(buffer, 0, len); }
- Force Processing: Unlike ReadWrite's version, forces compression/decompression regardless of file extensions for benchmarking purposes
Performance Characteristics
- Speed Calculation: Uses
Timer
class with formulabytes*1000d/t.elapsed
for MB/s calculation - Memory Usage: JVM allocation of 120MB (-Xmx120m -Xms120m) configured in shell script
- Timing Precision:
shared.Timer
class provides millisecond-precision performance measurement - Error Handling: Wraps
FileNotFoundException
andIOException
inRuntimeException
Special Features
- ZIP Stream Handling: Detects
ZipOutputStream.class
and callszos.closeEntry()
andzos.finish()
- Path Creation: Uses
File.getParentFile()
andparent.mkdirs()
whencreatePathIfNeeded=true
- Synchronization:
copyFile()
method is declaredsynchronized
for thread safety - Format Detection: Leverages ReadWrite class format detection based on file extensions and compression signatures
Use Cases
CopyFile is primarily designed for benchmarking due to its forced processing behavior:
- I/O Performance Benchmarking: Forces processing even with identical extensions, ensuring consistent compression/decompression timing
- Format Conversion: Convert between .txt, .gz, .zip, .bz2 formats using ReadWrite format detection
- Compression Overhead Testing: Measure the performance impact of different compression algorithms
- File Recompression: Main purpose as stated in source comments - recompressing existing files
- Speed Testing: Built-in MB/s reporting using file size and Timer elapsed time
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org