SplitSam4Way

Script: splitsam4way.sh Package: jgi Class: SplitSam4Way.java

Splits SAM reads into 4 output files based on mapping status: plus-strand mapped, minus-strand mapped, chimeric/discordant pairs, and unmapped reads. Useful for analyzing mapping quality and identifying different types of read pairs in SAM alignment files.

Basic Usage

splitsam4way.sh <input> <outplus> <outminus> <outchimeric> <outunmapped>

Takes exactly 5 positional arguments specifying input SAM file and the 4 output categories. Use 'null' for any output file you don't want to generate.

Parameters

This tool uses only positional arguments and has no optional parameters. All arguments are required except output files can be set to 'null' to skip generation.

Positional Arguments

input
Input SAM file containing aligned reads. Headers are preserved and written to all non-null output files.
outplus
Output file for reads mapping to the plus strand. Determined by examining the first fragment's strand orientation. Use 'null' to skip.
outminus
Output file for reads mapping to the minus strand. Determined by examining the first fragment's strand orientation. Use 'null' to skip.
outchimeric
Output file for chimeric or discordant read pairs. Includes pairs mapping to different chromosomes or same strand (same orientation). Use 'null' to skip.
outunmapped
Output file for unmapped reads or pairs. Includes reads where either fragment is unmapped, has no mate, or is not primary alignment. Use 'null' to skip.

Examples

Basic Read Splitting

splitsam4way.sh input.sam plus.sam minus.sam chimeric.sam unmapped.sam

Splits input.sam into four categories: plus-strand mapped pairs, minus-strand mapped pairs, chimeric/discordant pairs, and unmapped reads.

Skip Unwanted Categories

splitsam4way.sh input.sam plus.sam minus.sam null unmapped.sam

Splits reads into plus, minus, and unmapped categories while skipping chimeric reads (set to 'null').

Extract Only Chimeric Reads

splitsam4way.sh input.sam null null chimeric.sam null

Extracts only chimeric/discordant read pairs, useful for structural variant detection or quality assessment.

Separate Mapped from Unmapped

splitsam4way.sh input.sam mapped_plus.sam mapped_minus.sam chimeric.sam unmapped.sam

Complete four-way separation for downstream analysis of different mapping categories.

Algorithm Details

Classification Logic

SplitSam4Way uses a hierarchical classification system to categorize read pairs based on their SAM flags and mapping information:

1. Header Preservation

All SAM header lines (starting with '@') are copied to every non-null output file, ensuring downstream tools have complete format information.

2. Unmapped Classification

Reads are classified as unmapped if any of these conditions are true:

  • Either fragment is not mapped (!sl.mapped() || !sl.nextMapped())
  • Read has no mate pair (!sl.hasMate())
  • Read is not a primary alignment (!sl.primary())

This prioritizes unmapped status over other classifications.

3. Chimeric/Discordant Classification

For mapped pairs, reads are classified as chimeric if:

  • Pair fragments map to different chromosomes (!sl.pairedOnSameChrom())
  • Both fragments map to the same strand (sl.strand() == sl.nextStrand())

This identifies structural variants, translocations, and mapping artifacts.

4. Strand-Based Classification

For proper pairs, classification is based on the first fragment's strand orientation:

  • Plus strand: (sl.firstFragment() ? sl.strand() : sl.nextStrand()) == PLUS
  • Minus strand: (sl.firstFragment() ? sl.strand() : sl.nextStrand()) == MINUS

This ensures consistent strand assignment regardless of which fragment appears first in the SAM file.

Performance Characteristics

Output Statistics

The tool provides detailed runtime statistics upon completion:

Use Cases

Support

For questions and support: