DriftingAligner
Aligns a query sequence to a reference using DriftingAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.
Basic Usage
driftingaligner.sh <query> <ref>
driftingaligner.sh <query> <ref> <map>
driftingaligner.sh <query> <ref> <map> <iterations>
The tool accepts sequences as literal nucleotide strings or FASTA file paths. It outputs alignment identity and reference start/stop positions, with optional state space visualization for use with visualizealignment.sh.
Parameters
Parameters are provided as positional arguments in the order shown below.
Required Parameters
- query
- A literal nucleotide sequence or FASTA file path. The query sequence to be aligned against the reference.
- ref
- A literal nucleotide sequence or FASTA file path. The reference sequence that the query will be aligned to.
Optional Parameters
- map
- Optional output text file for matrix score space visualization. Set to null for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an alignment visualization image.
- iterations
- Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.
Examples
Basic Sequence Alignment
driftingaligner.sh ATCGATCGATCG ATCGATCGATCG
Aligns two identical sequences, should return identity of 1.0 with alignment spanning the full reference.
File-based Alignment
driftingaligner.sh query.fasta reference.fasta
Aligns sequences from FASTA files, outputting identity and alignment coordinates.
Alignment with Visualization
driftingaligner.sh query.fasta reference.fasta alignment_map.txt
Performs alignment and saves the state space exploration map to alignment_map.txt for visualization with visualizealignment.sh.
Benchmarking Mode
driftingaligner.sh ATCGATCGATCG ATCGATCGATCG null 1000
Runs the alignment 1000 times for benchmarking purposes without generating visualization output.
Algorithm Details
DriftingAligner implements a center-drifting banded dynamic programming algorithm using dual rolling arrays to achieve O(n) space complexity. The implementation features identity-responsive bandwidth adaptation and mathematical constraint solving for exact operation count recovery without traceback matrices.
Core Implementation Architecture
- Center Drift Mechanism: Band center adjusts using drift=Tools.mid(-1, maxPos-center, maxDrift) where maxPos tracks the highest-scoring position and maxDrift=3 limits drift speed.
- Identity-Responsive Bandwidth: Band width expands using scoreBonus=32-Integer.numberOfLeadingZeros(missingScore) where missingScore=i-score tracks alignment quality degradation.
- Rolling Array Implementation: Uses prev[rLen+1] and curr[rLen+1] with array swapping (temp=prev; prev=curr; curr=temp) to achieve O(n) space complexity.
- 64-bit Bit-Packed Encoding: POSITION_BITS=21, DEL_BITS=21, SCORE_SHIFT=42 with masks POSITION_MASK=(1L << 21)-1, supporting sequences up to 2^21-1 = 2,097,151 bases.
Bandwidth Decision Algorithm
The decideBandwidth() method (lines 58-64) calculates initial band width using early mismatch detection:
bandwidth = Tools.mid(8, 1+Math.max(qLen, rLen)/16, 40+(int)Math.sqrt(rLen)/4)
for(int i=0, minlen=Math.min(qLen, rLen); i<minlen && subs<bandwidth; i++) {
subs+=(query[i]!=ref[i] ? 1 : 0);
}
return Math.min(subs+1, bandwidth);
Bit-Field Architecture
The 64-bit encoding system uses specific bit ranges and constants:
- Position Tracking (bits 0-20): POSITION_MASK extracts reference start position
- Deletion Counting (bits 21-41): DEL_MASK=((1L<<21)-1)<<21 tracks deletion operations
- Score Storage (bits 42-63): SCORE_MASK=~(POSITION_MASK|DEL_MASK) stores alignment scores
Scoring Constants and Operations
- Match: MATCH=1L<<42 (+1 point shifted to score bits)
- Substitution: SUB=(-1L)<<42 (-1 point shifted to score bits)
- Insertion: INS=(-1L)<<42 (-1 point shifted to score bits)
- Deletion: DEL_INCREMENT=DEL+(1L<<21) (-1 score + position increment)
- N bases: N_SCORE=0L (neutral scoring for ambiguous bases)
Mathematical Constraint Solving
The postprocess() method (lines 194-244) uses a system of linear equations to recover exact operation counts:
// Extract packed information
int originPos = (int)(maxScore & POSITION_MASK);
int deletions = (int)((maxScore & DEL_MASK) >> POSITION_BITS);
int rawScore = (int)(maxScore >> SCORE_SHIFT);
// Solve constraint system: M+S+I=qLen, M+S+D=refAlnLength, Score=M-S-I-D
int insertions = Math.max(0, qLen+deletions-refAlnLength);
float matches = ((rawScore+qLen+deletions)/2f);
float substitutions = Math.max(0, qLen-matches-insertions);
float identity = matches/(matches+substitutions+insertions+deletions);
Band Navigation and Quality Response
- Dynamic Band Width: bandWidth=bandWidth0+Math.max(20+bandWidth0*8-maxDrift*i, scoreBonus) expands for low-quality regions
- Quarter-Band Buffer: quarterBand=bandWidth/4 provides asymmetric positioning for glocal alignment initialization
- Band Boundaries: bandStart=Math.max(bandStart, center-bandWidth+quarterBand) and bandEnd=Math.min(rLen, center+bandWidth+quarterBand)
- Quality Tracking: missingScore=i-score where i is current row and score is extracted from maxScore>>SCORE_SHIFT
Sequence Processing Optimizations
- Branchless N Handling: isMatch=(q==r && q!='N'), hasN=(q=='N' || r=='N'), scoreAdd=isMatch ? MATCH : (hasN ? N_SCORE : SUB)
- Sequence Swapping: Lines 75-79 swap sequences when posVector==null && query.length>ref.length to ensure query ≤ reference
- Local Mode: GLOBAL=false configures for local alignment with maxPos tracking optimal endpoint rather than forced global alignment
Output Coordinate Calculation
The tool extracts alignment coordinates from bit-packed scores:
- Identity: Calculated using constraint solving system with exact operation counts
- Reference Start (rstart): originPos=(int)(maxScore&POSITION_MASK) extracts 0-based start position
- Reference Stop (rstop): endPos=maxPos provides 0-based end position from highest-scoring cell
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org