BBRealign

Script: bbrealign.sh Package: var2 Class: Realign.java

Realigns mapped reads to a reference using dynamic programming alignment algorithms. This tool takes SAM or BAM files and improves the alignment of reads to the reference sequence, particularly useful for correcting alignment artifacts and improving indel detection accuracy.

Basic Usage

bbrealign.sh in=<file> ref=<file> out=<file>

Input may be a sorted or unsorted SAM or BAM file. The reference should be in FASTA format. Realigned reads are output in the same format as the input.

Parameters

Parameters are organized by their function in the realignment process. The tool processes reads by first applying SAM filters, then performing realignment using dynamic programming, and finally applying quality trimming.

I/O parameters

in=<file>
Input reads in SAM or BAM format. Can be sorted or unsorted.
out=<file>
Output realigned reads in SAM or BAM format (same as input format).
ref=<file>
Reference genome in FASTA format used for realignment.
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file.

Trimming parameters

border=0
Trim at least this many bases on both ends of reads after realignment.
qtrim=r
Quality-trim reads on this end. Options: r (right), l (left), rl (both), f (don't quality-trim).
trimq=10
Quality-trim bases below this Phred score threshold.

Realignment parameters

unclip=f
Convert clip symbols from exceeding the ends of the realignment zone into matches and substitutions. When true, clipped sequences are included in the realignment process.
repadding=70
Pad alignment by this many bases on each end of the realignment zone. Longer padding is more accurate for long indels but reduces speed. Recommended range: 50-200.
rerows=602
Maximum number of rows (read length) for the dynamic programming matrix. Reads longer than this cannot be realigned and will be passed through unchanged.
recols=2000
Maximum number of columns (reference segment length) for realignment. Must be at least read length plus maximum deletion length plus twice the padding value.
msa=
Select the multiple sequence aligner algorithm. Options:
• MultiStateAligner11ts (default): Optimized for Illumina reads
• MultiStateAligner9PacBio: Use for PacBio/Nanopore reads or Illumina reads mapped to long-read assemblies

Sam-filtering parameters

minpos=
Ignore alignments that do not overlap this genomic position range (start coordinate).
maxpos=
Ignore alignments that do not overlap this genomic position range (end coordinate).
minreadmapq=4
Ignore alignments with mapping quality lower than this value. Helps filter poorly mapped reads.
contigs=
Comma-delimited list of contig/chromosome names to include in processing. Names should have no spaces, or use underscores instead of spaces.
secondary=f
Include secondary alignments (reads with multiple mapping locations) in realignment processing.
supplementary=f
Include supplementary alignments (chimeric alignments) in realignment processing.
invert=f
Invert all SAM filtering criteria. Reads that would normally be excluded will be included and vice versa.

Java Parameters

-Xmx
Set Java's maximum heap memory usage, overriding autodetection. Examples: -Xmx20g (20 GB), -Xmx200m (200 MB). Maximum is typically 85% of physical memory.
-eoom
Exit process if an out-of-memory exception occurs. Requires Java 8u92 or later. Useful for preventing zombie processes.
-da
Disable Java assertions for slightly better performance in production use.

Examples

Basic Realignment

bbrealign.sh in=mapped.bam ref=genome.fa out=realigned.bam

Realigns all reads in a BAM file to improve alignment accuracy around indels.

PacBio/Nanopore Realignment

bbrealign.sh in=longreads.sam ref=assembly.fa out=realigned.sam msa=MultiStateAligner9PacBio rerows=10000 recols=50000

Optimized settings for long reads with larger matrix dimensions and PacBio-specific algorithm.

Quality Trimming with Realignment

bbrealign.sh in=reads.bam ref=genome.fa out=trimmed_realigned.bam qtrim=rl trimq=20 border=5

Realigns reads then trims low-quality bases from both ends and removes 5 bases from each end.

Filtered Realignment

bbrealign.sh in=all_reads.bam ref=genome.fa out=high_quality.bam minreadmapq=20 secondary=f supplementary=f

Only realigns high-quality primary alignments, excluding secondary and supplementary alignments.

Algorithm Details

Realignment Strategy

BBRealign uses Multiple Sequence Alignment (MSA) with dynamic programming to re-align reads that exhibit poor alignment characteristics. The Realigner class implements a glocal alignment strategy with padding around the original alignment region, retaining only realignments that improve the alignment score.

1. Alignment Quality Assessment (Realigner.realign method)

2. Multiple Sequence Alignment Engine

3. Reference Padding Strategy

4. Alignment Score Evaluation

5. Clipping and Coordinate Management

6. Quality Control and Filtering

Implementation Characteristics

When to Use BBRealign

Support

For questions and support: