SplitSam

Script: splitsam.sh Package: jgi Class: SplitSamFile.java

Splits a sam file into three files: Plus-mapped reads, Minus-mapped reads, and Unmapped. If 'header' is the 5th argument, header lines will be included.

Basic Usage

splitsam <input> <plus output> <minus output> <unmapped output> [header]

Input may be stdin or a sam file, raw or gzipped. Outputs must be sam files, and may be gzipped.

Parameters

SplitSam uses positional arguments rather than named parameters:

Positional Arguments

input
Input SAM file to split. Can be stdin, a regular SAM file, or gzipped SAM file.
plus output
Output file for reads mapped to the plus strand (forward strand). Must be a SAM file and may be gzipped.
minus output
Output file for reads mapped to the minus strand (reverse strand). Must be a SAM file and may be gzipped.
unmapped output
Output file for unmapped reads. Must be a SAM file and may be gzipped.
header (optional)
If the 5th argument is "header", SAM header lines (starting with @) will be included in all three output files. Default: header lines are excluded from output files.

Examples

Basic splitting without headers

splitsam aligned.sam plus_reads.sam minus_reads.sam unmapped_reads.sam

Splits aligned.sam into three files based on mapping strand, excluding SAM header lines from outputs.

Splitting with headers included

splitsam aligned.sam plus_reads.sam minus_reads.sam unmapped_reads.sam header

Same as above but includes SAM header lines (@SQ, @HD, etc.) in all three output files.

Working with gzipped files

splitsam aligned.sam.gz plus_reads.sam.gz minus_reads.sam.gz unmapped_reads.sam.gz header

Processes a gzipped input file and creates gzipped output files with headers included.

Using stdin input

samtools view input.bam | splitsam stdin plus.sam minus.sam unmapped.sam

Reads SAM data from stdin (converted from BAM using samtools) and splits into three files.

Algorithm Details

Strand Determination Implementation

SplitSam uses the SamLine.parseFlagOnly() method to extract SAM flag bits from line byte arrays, then applies SamLine.strand() and SamLine.mapped() static methods for classification:

Processing Architecture

The tool implements a ByteFile.nextLine() streaming parser with three concurrent ByteStreamWriter threads:

Header Processing Logic

When includeHeader boolean is true (5th argument equals "header"):

Performance Implementation

Statistical Output Implementation

Statistics are generated using Timer.stop() measurement and System.err.println() reporting:

Technical Notes

SAM Flag Interpretation

The tool relies on standard SAM flag bits for classification:

File Format Requirements

Error Handling Implementation

Support

For questions and support: