StreamSam

Script: streamsam.sh Package: stream Class: SamStreamerWrapper.java

Converts sam/bam to fastq rapidly with multiple threads. bam files require samtools or sambamba in the path.

Basic Usage

streamsam.sh in=<file> out=<file>

StreamSam converts SAM/BAM alignment files to FASTQ format using multiple threads via SamReadStreamer class. Processing uses SamStreamer.DEFAULT_THREADS when ordered=false and provides filtering via the SamFilter class to extract reads based on mapping quality, genomic coordinates, alignment flags, and contig names.

Parameters

Parameters are organized by their function in the SAM/BAM to FASTQ conversion process.

Input/Output Parameters

in=<file>
Input SAM or BAM file. Can be stdin.
out=<file>
Output FASTQ file. Can be stdout.
ref=<file>
Optional reference file. When provided, loads reference using ScafMap.loadReference() and sets RNAME_AS_BYTES=false for string-based contig name processing.

Filtering Parameters

minpos=
Ignore alignments not overlapping this range. Specifies minimum genomic position for coordinate-based filtering.
maxpos=
Ignore alignments not overlapping this range. Specifies maximum genomic position for coordinate-based filtering.
minmapq=
Ignore alignments with mapping quality (MAPQ) below this threshold. Higher values select more confidently mapped reads.
maxmapq=
Ignore alignments with mapping quality (MAPQ) above this threshold. Useful for selecting poorly mapped reads.
contigs=
Comma-delimited list of contig names to include. These should have no spaces, or underscores instead of spaces. Contig name matching handled by the SamFilter class.
mapped=t
Include mapped reads. Set to false to exclude reads with valid alignments.
unmapped=t
Include unmapped reads. Set to false to exclude unaligned reads.
secondary=f
Include secondary alignments. Secondary alignments represent alternative mapping locations for multi-mapping reads.
supplimentary=t
Include supplementary alignments. Supplementary alignments represent chimeric alignments or split reads.
lengthzero=f
Include alignments without bases. Controls whether zero-length alignments are retained in the output.
invert=f
Invert sam filters. When true, selects reads that would normally be excluded by the filtering criteria.

Processing Parameters

ordered=t
Keep reads in input order. False is faster but may change read order. When disabled, uses multiple threads more efficiently.
verbose=f
Print verbose progress information during processing.
reads=
Process at most this many reads. Also accepts 'maxreads'. Useful for testing or processing subsets.
forceparse=f
Force full parsing of SAM optional fields even when not needed for output format. Increases accuracy but reduces speed.

SAM Version Parameters

samversion=1.4
SAM format version to use for output. Also accepts 'samv' or 'sam'. Affects CIGAR string formatting.

Advanced Filtering Parameters

minid=0.0
Minimum alignment identity (0.0-1.0). Values >1 are interpreted as percentages and divided by 100.
maxid=1.0
Maximum alignment identity (0.0-1.0). Values >1 are interpreted as percentages and divided by 100.
duplicate=t
Include duplicate reads (reads marked as PCR or optical duplicates).
qfail=f
Include reads that failed quality checks (as marked in SAM flags).

Examples

Basic SAM to FASTQ Conversion

streamsam.sh in=alignments.sam out=reads.fastq

Converts a SAM file to FASTQ format, retaining all reads (mapped and unmapped).

BAM to FASTQ with Quality Filter

streamsam.sh in=alignments.bam out=high_quality.fastq minmapq=20

Converts BAM to FASTQ, keeping only reads with mapping quality ≥20. Requires samtools or sambamba in PATH for BAM input.

Extract Unmapped Reads Only

streamsam.sh in=alignments.bam out=unmapped.fastq mapped=f

Extracts only unmapped reads from a BAM file, useful for recovering unaligned sequences for further analysis.

Coordinate-Based Filtering

streamsam.sh in=alignments.sam out=region_reads.fastq contigs=chr1,chr2 minpos=1000000 maxpos=2000000

Extracts reads mapping to chromosomes 1 and 2 within the coordinate range 1,000,000 to 2,000,000.

High-Throughput Processing

streamsam.sh in=large_alignment.bam out=reads.fastq ordered=f

Fast conversion with multi-threading enabled (ordered=f). Order of reads in output may differ from input but processing is significantly faster.

Primary Alignments Only

streamsam.sh in=alignments.bam out=primary.fastq secondary=f supplimentary=f

Extracts only primary alignments, excluding secondary and supplementary alignments for cleaner downstream analysis.

Algorithm Details

StreamSam uses a multi-threaded streaming architecture for SAM/BAM to FASTQ conversion:

Threading Strategy

Memory Management

SAM Parsing Optimization

Filtering Implementation

Performance Characteristics

Reference Integration

When a reference file is provided via ref=<file>:

Support

For questions and support: