SplitSam6Way

Script: splitsam6way.sh Package: jgi Class: SplitSam6Way.java

Splits sam reads into 6 output files depending on mapping status and strand orientation for paired-end reads.

Basic Usage

splitsam6way.sh <input> <r1plus> <r1minus> <r1unmapped> <r2plus> <r2minus> <r2unmapped> [maxreads]

This tool processes paired-end SAM files and separates reads into six categories based on their mapping status and strand orientation.

Parameters

This tool uses positional arguments in a specific order:

Required Arguments

input
Input SAM file to be split. Must be a valid SAM format file.
r1plus
Output file for R1 reads mapped to the plus strand. Use 'null' to skip this output.
r1minus
Output file for R1 reads mapped to the minus strand. Use 'null' to skip this output.
r1unmapped
Output file for R1 reads that are unmapped. Use 'null' to skip this output.
r2plus
Output file for R2 reads mapped to the plus strand. Use 'null' to skip this output.
r2minus
Output file for R2 reads mapped to the minus strand. Use 'null' to skip this output.
r2unmapped
Output file for R2 reads that are unmapped. Use 'null' to skip this output.

Optional Arguments

maxreads
Maximum number of reads to process. Default: unlimited (Long.MAX_VALUE). Accepts K/M/G suffixes for thousands/millions/billions.

Examples

Basic Usage - Split All Categories

splitsam6way.sh input.sam r1_plus.sam r1_minus.sam r1_unmapped.sam r2_plus.sam r2_minus.sam r2_unmapped.sam

Splits the input SAM file into 6 separate output files based on read pair and mapping status.

Skip Unwanted Categories

splitsam6way.sh input.sam r1_plus.sam null null r2_plus.sam null null

Only outputs R1 and R2 reads mapped to the plus strand, skipping minus strand and unmapped reads.

Process Limited Number of Reads

splitsam6way.sh input.sam r1_plus.sam r1_minus.sam r1_unmapped.sam r2_plus.sam r2_minus.sam r2_unmapped.sam 1000000

Process only the first 1 million reads from the input file.

Separate Mapped and Unmapped Only

splitsam6way.sh input.sam null null r1_unmapped.sam null null r2_unmapped.sam

Extract only unmapped reads for both R1 and R2, useful for recovering unaligned sequences.

Algorithm Details

SplitSam6Way implements a stream-based SAM file parsing algorithm with concurrent I/O processing for categorizing paired-end sequencing data:

Processing Architecture

Read Classification Implementation

Each read undergoes systematic classification using SamLine parsing methods:

  1. SamLine Construction: new SamLine(line) parses tab-delimited SAM fields into structured object
  2. Pair Classification: sl.pairnum()==0 distinguishes R1 reads from R2 reads (non-zero values)
  3. Mapping Detection: sl.mapped() evaluates FLAG field bits to determine alignment status
  4. Strand Analysis: sl.strand() compares against Shared.PLUS constant for forward/reverse strand identification
  5. Conditional Routing: Nested if-else structure routes reads to appropriate ByteStreamWriter based on classification

File Management System

Performance Implementation

Statistics Collection

The tool tracks processing metrics using dedicated counters:

Use Cases

Strand-Specific Analysis

Separate reads by strand orientation for strand-specific RNA-seq analysis or antisense transcript detection.

Quality Control

Isolate unmapped reads for further analysis, adapter contamination checking, or alternative alignment strategies.

Differential Processing

Apply different processing pipelines to reads based on their mapping characteristics and pair orientation.

Library Preparation Assessment

Evaluate strand bias in sequencing libraries by comparing plus and minus strand read distributions.

Support

For questions and support: