BamLineStreamer
Converts BAM (Binary Alignment/Map) files to SAM (Sequence Alignment/Map) text format. Reads BGZF-compressed BAM files and outputs tab-delimited SAM format. This tool is useful for viewing BAM contents in human-readable format or processing with text-based tools.
Basic Usage
bamlinestreamer.sh <input.bam> <output.sam>
BamLineStreamer converts binary BAM alignment files to text-based SAM format. The tool reads the BAM header and all alignment records, preserving all fields including read names, flags, positions, CIGAR strings, sequences, and quality scores.
Parameters
Parameters control input/output locations and Java runtime settings.
Standard Parameters
- in=<file>
- Input BAM file (first positional argument). Must be a valid BGZF-compressed BAM format file.
- out=<file>
- Output SAM file (second positional argument). Will be written in standard SAM text format with tab-delimited fields.
Java Parameters
- -Xmx
- Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 8g (fixed allocation).
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Examples
Basic Conversion
bamlinestreamer.sh aligned.bam aligned.sam
Convert a BAM file to SAM format using positional arguments.
Using Named Parameters
bamlinestreamer.sh in=aligned.bam out=aligned.sam
Convert using explicit parameter names for clarity.
Large File Conversion
bamlinestreamer.sh in=large.bam out=large.sam -Xmx32g
Process a large BAM file with increased memory allocation.
Algorithm Details
Processing Pipeline
- BAM Reader Initialization: Creates BamLineStreamer with 4 threads for parallel processing and enables header saving
- Header Extraction: Waits for header to be populated, then makes thread-safe copy to avoid concurrent modification
- Header Output: Writes all header lines (starting with @) to output file
- Alignment Processing: Reads alignment records in batches using multi-threaded streaming
- SAM Conversion: Converts each SamLine object to tab-delimited text format using toText() method
- Statistics Reporting: Reports total number of alignments converted
Memory Requirements
Memory usage is modest for most files. The default 8GB allocation handles typical BAM files efficiently. The multi-threaded streaming design processes records in batches rather than loading the entire file into memory.
SAM Format Output
Output follows standard SAM format specification with 11 mandatory fields:
- QNAME: Query template name (read name)
- FLAG: Bitwise flags indicating read properties
- RNAME: Reference sequence name
- POS: 1-based leftmost mapping position
- MAPQ: Mapping quality (Phred-scaled)
- CIGAR: CIGAR string describing alignment
- RNEXT: Reference name of mate/next read
- PNEXT: Position of mate/next read
- TLEN: Template length (insert size)
- SEQ: Segment sequence
- QUAL: ASCII-encoded base qualities
All optional tags from the BAM file are preserved in the SAM output.
Performance Considerations
The tool uses 4 threads for BAM decompression and parsing, providing good throughput for typical workloads. For very large files, consider using -Xmx to allocate more memory if needed.
Support
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.