Unzip

Script: unzip.sh Package: jgi Class: Unzip.java

Compresses or decompresses files based on extensions. This only exists because the syntax and default behavior of many compression utilities is unintuitive; it is just a wrapper, and relies on existing executables in the command line (pigz, lbzip, etc.) Does not delete the input file. Does not untar files.

Basic Usage

unzip.sh in=<file> out=<file>

The unzip tool automatically detects compression based on file extensions and applies the appropriate compression or decompression operation. It preserves the input file and creates a new output file.

Parameters

Parameters control input/output files, compression settings, and processing options.

File Parameters

in=<file>
Input file to compress or decompress. Required parameter.
out=<file>
Output file for processed data. If not specified, output goes to stdout.
invalid=<file>
Output file for invalid or problematic data during processing.

Compression Parameters

zl=
Set the compression level; accepts values 0-9 or 11. Higher values provide better compression but take longer to process. Default varies by compression algorithm.

Processing Parameters

lines=<number>
Maximum number of lines to process. Use -1 for unlimited processing. Default: unlimited (Long.MAX_VALUE).
verbose=<boolean>
Enable verbose output for debugging and monitoring progress. Default: false.

Examples

Basic Decompression

unzip.sh in=data.fq.gz out=data.fq

Decompresses a gzipped FASTQ file to plain text format.

Compression with Level Control

unzip.sh in=data.fq out=data.fq.gz zl=9

Compresses a FASTQ file using maximum compression level (9).

Verbose Processing

unzip.sh in=large_file.bz2 out=large_file verbose=t

Decompresses a bzip2 file with verbose output showing progress information.

Limited Line Processing

unzip.sh in=data.gz out=sample.txt lines=1000

Processes only the first 1000 lines from a compressed file.

Algorithm Details

The unzip tool is a wrapper around standard compression utilities that addresses common usability issues with command-line compression tools. The implementation uses simple I/O streaming rather than implementing custom compression algorithms.

Processing Strategy

Memory Management

The tool uses minimal memory (default -Xmx80m set in shell script) and processes files via InputStream.read() calls to a fixed buffer. The single buffer allocation ensures constant memory usage regardless of file size.

Error Handling

The tool validates input/output file accessibility using Tools.testInputFiles() and Tools.testOutputFiles() before processing. IOException handling occurs in processInner() during stream read operations, setting errorState flag when errors occur.

Compatibility

Built on BBTools infrastructure using FileFormat.testInput() and FileFormat.testOutput() for file handling, ByteStreamWriter for output, and PreParser for argument processing. Uses Tools.testForDuplicateFiles() to prevent file conflicts.

Technical Notes

Supported Formats

Performance Characteristics

Integration with BBTools

Uses standard BBTools classes: FileFormat for file type detection, Tools utilities for validation, ReadWrite for compression handling, and Shared for thread configuration. Follows BBTools patterns for argument parsing via PreParser and error handling via errorState flags.

Support

For questions and support: