QuantumAligner
Aligns a query sequence to a reference using QuantumAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.
Basic Usage
quantumaligner.sh <query> <ref>
quantumaligner.sh <query> <ref> <map>
quantumaligner.sh <query> <ref> <map> <iterations>
QuantumAligner performs sequence alignment using sparse matrix exploration with IntList activeList and nextList tracking only positions within adaptive bandwidth. Uses dual long arrays prev[rLen+1] and curr[rLen+1] achieving O(n) space complexity while avoiding traceback operations through mathematical constraint solving.
Parameters
QuantumAligner uses positional arguments rather than named parameters. All parameters are optional except query and reference sequences.
Input Parameters
- query
- A literal nucleotide sequence or fasta file. The query sequence to be aligned to the reference. Can handle any characters, with special handling for 'N' bases.
- ref
- A literal nucleotide sequence or fasta file. The reference sequence to align against. Maximum length is limited to 2Mbp due to 21 position bits encoding.
- map
- Optional output text file for matrix score space visualization. Contains the state space exploration map that can be fed to visualizealignment.sh to generate alignment visualization images. Set to null for benchmarking with no visualization output.
- iterations
- Optional integer for benchmarking multiple iterations. Allows running the alignment multiple times for performance testing and timing analysis.
Examples
Basic Alignment
quantumaligner.sh ATCGATCGATCG ATCGATCGATCG
Aligns two identical sequences, should return 100% identity.
File-Based Alignment
quantumaligner.sh query.fasta reference.fasta
Aligns sequences from FASTA files.
Alignment with Visualization
quantumaligner.sh query.fasta reference.fasta alignment_map.txt
Performs alignment and outputs visualization map to alignment_map.txt for later visualization.
Benchmarking
quantumaligner.sh ATCGATCGATCG ATCGATCGATCG null 1000
Runs alignment 1000 times for benchmarking without generating visualization output.
Algorithm Details
QuantumAligner implements sparse matrix alignment using decideBandwidth() method and active position tracking through IntList structures with quantum teleportation features for handling long deletions:
Core Implementation Features
- Sparse Matrix Architecture: Uses IntList activeList and nextList to track only positions within bandwidth constraints, avoiding computation of BAD (Long.MIN_VALUE/2) regions
- Dual Array System: Two long arrays prev[rLen+1] and curr[rLen+1] achieving O(n) space complexity while maintaining full alignment accuracy
- Mathematical Constraint Solving: System of equations M+S+I=qLen, M+S+D=refAlnLength, Score=M-S-I-D for exact operation count recovery without traceback matrices
- Quantum Teleportation: Mathematical optimization allowing jumps between high-scoring regions using maxScore+DEL_INCREMENT*(j-maxPos) calculation across unexplored gaps
Bandwidth Calculation Algorithm
- decideBandwidth() Method: Calculates bandwidth=Tools.min(qLen/4+2, maxLen/32, (int)Tools.log2(maxLen+256)+2) with minimum of 2, plus 3 buffer
- Substitution-Based Optimization: Counts mismatches in sequence prefix to optimize bandwidth: Math.min(subs+1, bandwidth)
- Adaptive Processing: Bandwidth adjustment based on sequence identity patterns for high-identity sequence optimization
- Memory Efficiency: Limits active positions to typically much less than full reference length through sparse exploration
Scoring System Implementation
- Match Score: MATCH = 1L << SCORE_SHIFT for matching bases when q==r && q!='N'
- Substitution Score: SUB = (-1L) << SCORE_SHIFT for mismatches
- Insertion Score: INS = (-1L) << SCORE_SHIFT for query insertions
- Deletion Score: DEL = (-1L) << SCORE_SHIFT with position increment DEL_INCREMENT = DEL + (1L << POSITION_BITS)
- N-base Handling: N_SCORE = 0L when hasN = (q=='N' || r=='N')
- Invalid Marker: BAD = Long.MIN_VALUE/2 marks uncomputed/invalid positions
Bridge Construction Mechanism
- Periodic Expansion: BRIDGE_PERIOD=16 iterations between bridge construction events
- Position Addition: Adds up to 35 extra positions when !nextMatch && bridgeTime<1 for catching long deletions
- Bridge Timing: bridgeTime countdown with reset to BRIDGE_PERIOD after bridge construction
- Match Extension: EXTEND_MATCH=true automatically adds j+1 to nextList when isMatch && last < j+1
Bit Field Architecture
- Position Encoding: POSITION_BITS=21 supporting sequences up to 2,097,151 bases (POSITION_MASK=(1L<<21)-1)
- Deletion Tracking: DEL_BITS=21 with DEL_MASK=((1L<<21)-1)<<POSITION_BITS for operation counting
- Score Storage: SCORE_SHIFT=42 with remaining 22 bits for alignment scores
- Bit Field Layout: [Score: bits 42-63][Deletions: bits 21-41][Position: bits 0-20]
Performance Optimization Features
- Branchless Scoring: Conditional expressions like maxValue=(maxDiagUp&SCORE_MASK)>=leftScore for CPU pipeline optimization
- Loop-Free Position Extension: LOOP_VERSION=false uses unrolled position extension with rightExtend=max(5, bandwidth-2)
- Dense Top Processing: Optional DENSE_TOP mode for first topWidth rows with full matrix computation
- Memory Management: 200MB default allocation via parseJavaArgs("--mem=200m", "--mode=fixed")
Identity Calculation Process
- Operation Count Recovery: Extracts originPos=(int)(maxScore&POSITION_MASK) and deletions from bit fields
- Mathematical Solving: insertions=max(0, qLen+deletions-refAlnLength), matches=(rawScore+qLen+deletions)/2f
- Final Identity: matches/(matches+substitutions+insertions+deletions) with complete operation accounting
- Coordinate Extraction: Reference start/stop positions from packed bit fields without traceback
Visualization Integration
- Visualizer Object: Created when output!=null for matrix exploration pattern recording
- State Space Mapping: Records sparse matrix exploration patterns for visualizealignment.sh processing
- Debug Support: Optional PRINT_OPS=false flag for detailed operation count debugging and validation
- Performance Monitoring: AtomicLong loops counter tracking computational work via loops.addAndGet(mloops)
Technical Implementation
Sequence Processing
- Sequences can contain any characters, with special handling for N bases using hasN=((q|r)>=15) logic
- Automatic sequence swapping when posVector==null && query.length>ref.length for memory optimization
- Maximum reference sequence length is 2,097,151 bases due to 21-bit position encoding limitation
- Length validation: assert(ref.length<=POSITION_MASK) preventing overflow in bit-packed operations
Memory Architecture
- Primary arrays: Two long[rLen+1] arrays (prev and curr) = 16 * (rLen+1) bytes
- Active tracking: IntList activeList and nextList with capacity rLen+4 for sparse position management
- JVM allocation: 200MB via parseJavaArgs("--mem=200m", "--mode=fixed") for matrix computation
- Space complexity: O(rLen) rather than O(qLen * rLen) of traditional dynamic programming
- Visualizer overhead: Optional Visualizer object allocation when output file specified
Performance Characteristics
- Time complexity: O(qLen * activePositions) where activePositions << rLen due to sparse optimization
- Bandwidth-limited processing: activeList.size-1 positions per row instead of full rLen processing
- Bridge construction: Periodic addition of up to 35 positions every 16 iterations for deletion recovery
- Insertion penalty: insPad=-16-rLen/128-(int)Math.sqrt(rLen)-(5*Math.max(0, rLen-qLen))/4 threshold
- Loop counting: Comprehensive mloops tracking with loops.addAndGet() for performance analysis
Output Format
- Identity Value: Float (0.0-1.0) calculated from complete operation counts via mathematical constraint solving
- Reference Coordinates: Starting position (rStart) and ending position (rStop) extracted from bit-packed score values
- Optional Statistics: Raw alignment score (bits 42-63) and deletion counts (bits 21-41) when posVector.length > 2
- Global Mode Output: When GLOBAL=true, forces maxPos=rLen and maxScore=prev[rLen-1]+DEL_INCREMENT adjustment
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org