DriftingAligner

Script: driftingaligner.sh Package: aligner Class: DriftingAligner.java

Aligns a query sequence to a reference using DriftingAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.

Basic Usage

driftingaligner.sh <query> <ref>
driftingaligner.sh <query> <ref> <map>
driftingaligner.sh <query> <ref> <map> <iterations>

The tool accepts sequences as literal nucleotide strings or FASTA file paths. It outputs alignment identity and reference start/stop positions, with optional state space visualization for use with visualizealignment.sh.

Parameters

Parameters are provided as positional arguments in the order shown below.

Required Parameters

query: A literal nucleotide sequence or FASTA file path. The query sequence to be aligned against the reference.
ref: A literal nucleotide sequence or FASTA file path. The reference sequence that the query will be aligned to.

Optional Parameters

map: Optional output text file for matrix score space visualization. Set to null for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an alignment visualization image.
iterations: Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.

Examples

Basic Sequence Alignment

driftingaligner.sh ATCGATCGATCG ATCGATCGATCG

Aligns two identical sequences, should return identity of 1.0 with alignment spanning the full reference.

File-based Alignment

driftingaligner.sh query.fasta reference.fasta

Aligns sequences from FASTA files, outputting identity and alignment coordinates.

Alignment with Visualization

driftingaligner.sh query.fasta reference.fasta alignment_map.txt

Performs alignment and saves the state space exploration map to alignment_map.txt for visualization with visualizealignment.sh.

Benchmarking Mode

driftingaligner.sh ATCGATCGATCG ATCGATCGATCG null 1000

Runs the alignment 1000 times for benchmarking purposes without generating visualization output.

Algorithm Details

DriftingAligner implements a center-drifting banded dynamic programming algorithm using dual rolling arrays to achieve O(n) space complexity. The implementation features identity-responsive bandwidth adaptation and mathematical constraint solving for exact operation count recovery without traceback matrices.

Core Implementation Architecture

Center Drift Mechanism: Band center adjusts using drift=Tools.mid(-1, maxPos-center, maxDrift) where maxPos tracks the highest-scoring position and maxDrift=3 limits drift speed.
Identity-Responsive Bandwidth: Band width expands using scoreBonus=32-Integer.numberOfLeadingZeros(missingScore) where missingScore=i-score tracks alignment quality degradation.
Rolling Array Implementation: Uses prev[rLen+1] and curr[rLen+1] with array swapping (temp=prev; prev=curr; curr=temp) to achieve O(n) space complexity.
64-bit Bit-Packed Encoding: POSITION_BITS=21, DEL_BITS=21, SCORE_SHIFT=42 with masks POSITION_MASK=(1L << 21)-1, supporting sequences up to 2^21-1 = 2,097,151 bases.

Bandwidth Decision Algorithm

The decideBandwidth() method (lines 58-64) calculates initial band width using early mismatch detection:

bandwidth = Tools.mid(8, 1+Math.max(qLen, rLen)/16, 40+(int)Math.sqrt(rLen)/4)
for(int i=0, minlen=Math.min(qLen, rLen); i<minlen && subs<bandwidth; i++) {
    subs+=(query[i]!=ref[i] ? 1 : 0);
}
return Math.min(subs+1, bandwidth);

Bit-Field Architecture

The 64-bit encoding system uses specific bit ranges and constants:

Position Tracking (bits 0-20): POSITION_MASK extracts reference start position
Deletion Counting (bits 21-41): DEL_MASK=((1L<<21)-1)<<21 tracks deletion operations
Score Storage (bits 42-63): SCORE_MASK=~(POSITION_MASK|DEL_MASK) stores alignment scores

Scoring Constants and Operations

Match: MATCH=1L<<42 (+1 point shifted to score bits)
Substitution: SUB=(-1L)<<42 (-1 point shifted to score bits)
Insertion: INS=(-1L)<<42 (-1 point shifted to score bits)
Deletion: DEL_INCREMENT=DEL+(1L<<21) (-1 score + position increment)
N bases: N_SCORE=0L (neutral scoring for ambiguous bases)

Mathematical Constraint Solving

The postprocess() method (lines 194-244) uses a system of linear equations to recover exact operation counts:

// Extract packed information
int originPos = (int)(maxScore & POSITION_MASK);
int deletions = (int)((maxScore & DEL_MASK) >> POSITION_BITS);
int rawScore = (int)(maxScore >> SCORE_SHIFT);

// Solve constraint system: M+S+I=qLen, M+S+D=refAlnLength, Score=M-S-I-D
int insertions = Math.max(0, qLen+deletions-refAlnLength);
float matches = ((rawScore+qLen+deletions)/2f);
float substitutions = Math.max(0, qLen-matches-insertions);
float identity = matches/(matches+substitutions+insertions+deletions);

Band Navigation and Quality Response

Dynamic Band Width: bandWidth=bandWidth0+Math.max(20+bandWidth0*8-maxDrift*i, scoreBonus) expands for low-quality regions
Quarter-Band Buffer: quarterBand=bandWidth/4 provides asymmetric positioning for glocal alignment initialization
Band Boundaries: bandStart=Math.max(bandStart, center-bandWidth+quarterBand) and bandEnd=Math.min(rLen, center+bandWidth+quarterBand)
Quality Tracking: missingScore=i-score where i is current row and score is extracted from maxScore>>SCORE_SHIFT

Sequence Processing Optimizations

Branchless N Handling: isMatch=(q==r && q!='N'), hasN=(q=='N' || r=='N'), scoreAdd=isMatch ? MATCH : (hasN ? N_SCORE : SUB)
Sequence Swapping: Lines 75-79 swap sequences when posVector==null && query.length>ref.length to ensure query ≤ reference
Local Mode: GLOBAL=false configures for local alignment with maxPos tracking optimal endpoint rather than forced global alignment

Output Coordinate Calculation

The tool extracts alignment coordinates from bit-packed scores:

Identity: Calculated using constraint solving system with exact operation counts
Reference Start (rstart): originPos=(int)(maxScore&POSITION_MASK) extracts 0-based start position
Reference Stop (rstop): endPos=maxPos provides 0-based end position from highest-scoring cell

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org