BamLineStreamer

Script: bamlinestreamer.sh Package: stream.bam Class: Bam2Sam.java

Converts BAM (Binary Alignment/Map) files to SAM (Sequence Alignment/Map) text format. Reads BGZF-compressed BAM files and outputs tab-delimited SAM format. This tool is useful for viewing BAM contents in human-readable format or processing with text-based tools.

Basic Usage

bamlinestreamer.sh <input.bam> <output.sam>

BamLineStreamer converts binary BAM alignment files to text-based SAM format. The tool reads the BAM header and all alignment records, preserving all fields including read names, flags, positions, CIGAR strings, sequences, and quality scores.

Parameters

Parameters control input/output locations and Java runtime settings.

Standard Parameters

in=<file>
Input BAM file (first positional argument). Must be a valid BGZF-compressed BAM format file.
out=<file>
Output SAM file (second positional argument). Will be written in standard SAM text format with tab-delimited fields.

Java Parameters

-Xmx
Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 8g (fixed allocation).
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Basic Conversion

bamlinestreamer.sh aligned.bam aligned.sam

Convert a BAM file to SAM format using positional arguments.

Using Named Parameters

bamlinestreamer.sh in=aligned.bam out=aligned.sam

Convert using explicit parameter names for clarity.

Large File Conversion

bamlinestreamer.sh in=large.bam out=large.sam -Xmx32g

Process a large BAM file with increased memory allocation.

Algorithm Details

Processing Pipeline

  1. BAM Reader Initialization: Creates BamLineStreamer with 4 threads for parallel processing and enables header saving
  2. Header Extraction: Waits for header to be populated, then makes thread-safe copy to avoid concurrent modification
  3. Header Output: Writes all header lines (starting with @) to output file
  4. Alignment Processing: Reads alignment records in batches using multi-threaded streaming
  5. SAM Conversion: Converts each SamLine object to tab-delimited text format using toText() method
  6. Statistics Reporting: Reports total number of alignments converted

Memory Requirements

Memory usage is modest for most files. The default 8GB allocation handles typical BAM files efficiently. The multi-threaded streaming design processes records in batches rather than loading the entire file into memory.

SAM Format Output

Output follows standard SAM format specification with 11 mandatory fields:

All optional tags from the BAM file are preserved in the SAM output.

Performance Considerations

The tool uses 4 threads for BAM decompression and parsing, providing good throughput for typical workloads. For very large files, consider using -Xmx to allocate more memory if needed.

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.