MergeSam

Script: mergesam.sh Package: jgi Class: MergeSam.java

Concatenates sam files, keeping only the header from the first file while merging all alignment records. Handles header conflicts by preserving only the first header and filtering out subsequent headers.

Basic Usage

mergesam.sh <files> out=<file>

Input files can be specified as positional arguments or using the in= parameter. If no output file is specified, results are written to stdout.

Parameters

MergeSam accepts standard BBTools parameters for input/output handling and processing control.

Core Parameters

in=<file>
Input SAM file(s). Multiple files can be specified as comma-separated list or as positional arguments. If a file path is provided as a positional argument and the file exists, it will be treated as input.
out=stdout.sam
Output file for merged SAM data. Default is stdout. Use 'null' to disable output.
invalid=<file>
Optional output file for invalid lines (headers found after the first file). Lines that don't pass validation are written here instead of the main output.
lines=<long>
Maximum number of lines to process. Set to -1 or omit for unlimited processing. Default processes all lines.

Processing Parameters

verbose=f
Enable verbose output showing detailed processing information. Affects multiple internal components including file readers and writers.
overwrite=t
Allow overwriting of existing output files. Default is true.
append=f
Append to existing output files instead of overwriting. Default is false.

Java Parameters

-da
Disable Java assertions for improved performance in production environments.

Examples

Basic SAM File Merging

mergesam.sh file1.sam file2.sam file3.sam out=merged.sam

Merges three SAM files into a single output file, keeping only the header from file1.sam.

Using Input Parameter

mergesam.sh in=file1.sam,file2.sam,file3.sam out=merged.sam

Alternative syntax using the in= parameter to specify multiple input files.

Writing to Standard Output

mergesam.sh *.sam > merged.sam

Merges all SAM files in the current directory, writing results to stdout and redirecting to a file.

Handling Invalid Headers

mergesam.sh file1.sam file2.sam out=merged.sam invalid=rejected_headers.sam

Merges files while saving any invalid header lines (headers from files after the first) to a separate file.

Limited Processing

mergesam.sh in=large_file.sam out=sample.sam lines=1000

Processes only the first 1000 lines of the input file, useful for testing or sampling.

Algorithm Details

MergeSam implements a streaming merge algorithm that efficiently concatenates SAM files while handling header conflicts:

Header Processing Strategy

The tool uses a header mode flag that starts as true for the first file. When processing:

Line-by-Line Processing

The algorithm processes each input file sequentially using ByteFile readers:

Memory Efficiency

The implementation is designed for large-scale SAM file processing:

Error Handling

Robust error handling ensures data integrity:

Performance Characteristics

Technical Notes

SAM Format Considerations

When merging SAM files, be aware of potential compatibility issues:

Best Practices

Support

For questions and support: