Cat

Script: cat.sh Package: fileIO Class: Concatenate.java

Concatenates and recompresses files. This tool reads multiple input files sequentially and outputs everything to a single output file, allowing for recompression while avoiding the use of stdio.

Basic Usage

cat.sh *.fna out=catted.fa.gz

The cat tool accepts multiple input files (either specified with the in= parameter or as bare filenames) and concatenates them into a single output file. It can handle compressed files and recompress the output as needed.

Parameters

Parameters are organized by their function in the concatenation process. The tool currently has a minimal set of parameters focused on file input/output and compression control.

Standard parameters

in=<file>
Comma-delimited input files. Multiple files can be specified by separating with commas. Filenames with no 'in=' prefix will also be treated as input files.
out=<file>
Output destination. Defaults to stdout if not specified. The output format will be automatically determined from the file extension.
ziplevel=2
(zl) Set compression level from 1 (lowest/fastest) through 9 (maximum/slowest). Lower compression levels process faster but produce larger files. Default is 2 for a good balance of speed and compression.

Java Parameters

-Xmx
Sets Java's memory usage, overriding autodetection. Examples: -Xmx20g specifies 20 gigabytes of RAM, -Xmx200m specifies 200 megabytes. The maximum is typically 85% of physical memory. Default for this tool is 200m.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92 or later.
-da
Disable Java assertions. May provide a small performance improvement in production use.

Examples

Basic Concatenation

cat.sh file1.fasta file2.fasta file3.fasta out=combined.fasta

Concatenates three FASTA files into a single output file.

Concatenation with Compression

cat.sh *.fastq out=all_reads.fastq.gz ziplevel=6

Concatenates all FASTQ files in the current directory into a compressed output file using compression level 6.

Using Comma-Delimited Input

cat.sh in=sample1.fa,sample2.fa,sample3.fa out=merged.fa.gz

Specifies input files using the in= parameter with comma separation.

Output to stdout

cat.sh file1.fq file2.fq | other_tool.sh

Concatenates files and pipes output to another tool. When no output file is specified, data goes to stdout.

Algorithm Details

The concatenation tool uses a straightforward sequential processing approach:

Processing Strategy

Memory Usage

The tool is designed for minimal memory footprint:

Compression Handling

Automatic format detection and conversion:

Performance Characteristics

Support

For questions and support: