QuantumAligner

Basic Usage

quantumaligner.sh <query> <ref>
quantumaligner.sh <query> <ref> <map>
quantumaligner.sh <query> <ref> <map> <iterations>

QuantumAligner performs sequence alignment using sparse matrix exploration with IntList activeList and nextList tracking only positions within adaptive bandwidth. Uses dual long arrays prev[rLen+1] and curr[rLen+1] achieving O(n) space complexity while avoiding traceback operations through mathematical constraint solving.

Parameters

QuantumAligner uses positional arguments rather than named parameters. All parameters are optional except query and reference sequences.

Input Parameters

query: A literal nucleotide sequence or fasta file. The query sequence to be aligned to the reference. Can handle any characters, with special handling for 'N' bases.
ref: A literal nucleotide sequence or fasta file. The reference sequence to align against. Maximum length is limited to 2Mbp due to 21 position bits encoding.
map: Optional output text file for matrix score space visualization. Contains the state space exploration map that can be fed to visualizealignment.sh to generate alignment visualization images. Set to null for benchmarking with no visualization output.
iterations: Optional integer for benchmarking multiple iterations. Allows running the alignment multiple times for performance testing and timing analysis.

Examples

Basic Alignment

quantumaligner.sh ATCGATCGATCG ATCGATCGATCG

Aligns two identical sequences, should return 100% identity.

File-Based Alignment

quantumaligner.sh query.fasta reference.fasta

Aligns sequences from FASTA files.

Alignment with Visualization

quantumaligner.sh query.fasta reference.fasta alignment_map.txt

Performs alignment and outputs visualization map to alignment_map.txt for later visualization.

Benchmarking

quantumaligner.sh ATCGATCGATCG ATCGATCGATCG null 1000

Runs alignment 1000 times for benchmarking without generating visualization output.

Algorithm Details

QuantumAligner implements sparse matrix alignment using decideBandwidth() method and active position tracking through IntList structures with quantum teleportation features for handling long deletions:

Core Implementation Features

Sparse Matrix Architecture: Uses IntList activeList and nextList to track only positions within bandwidth constraints, avoiding computation of BAD (Long.MIN_VALUE/2) regions
Dual Array System: Two long arrays prev[rLen+1] and curr[rLen+1] achieving O(n) space complexity while maintaining full alignment accuracy
Mathematical Constraint Solving: System of equations M+S+I=qLen, M+S+D=refAlnLength, Score=M-S-I-D for exact operation count recovery without traceback matrices
Quantum Teleportation: Mathematical optimization allowing jumps between high-scoring regions using maxScore+DEL_INCREMENT*(j-maxPos) calculation across unexplored gaps

Bandwidth Calculation Algorithm

decideBandwidth() Method: Calculates bandwidth=Tools.min(qLen/4+2, maxLen/32, (int)Tools.log2(maxLen+256)+2) with minimum of 2, plus 3 buffer
Substitution-Based Optimization: Counts mismatches in sequence prefix to optimize bandwidth: Math.min(subs+1, bandwidth)
Adaptive Processing: Bandwidth adjustment based on sequence identity patterns for high-identity sequence optimization
Memory Efficiency: Limits active positions to typically much less than full reference length through sparse exploration

Scoring System Implementation

Match Score: MATCH = 1L << SCORE_SHIFT for matching bases when q==r && q!='N'
Substitution Score: SUB = (-1L) << SCORE_SHIFT for mismatches
Insertion Score: INS = (-1L) << SCORE_SHIFT for query insertions
Deletion Score: DEL = (-1L) << SCORE_SHIFT with position increment DEL_INCREMENT = DEL + (1L << POSITION_BITS)
N-base Handling: N_SCORE = 0L when hasN = (q=='N' || r=='N')
Invalid Marker: BAD = Long.MIN_VALUE/2 marks uncomputed/invalid positions

Bridge Construction Mechanism

Periodic Expansion: BRIDGE_PERIOD=16 iterations between bridge construction events
Position Addition: Adds up to 35 extra positions when !nextMatch && bridgeTime<1 for catching long deletions
Bridge Timing: bridgeTime countdown with reset to BRIDGE_PERIOD after bridge construction
Match Extension: EXTEND_MATCH=true automatically adds j+1 to nextList when isMatch && last < j+1

Bit Field Architecture

Position Encoding: POSITION_BITS=21 supporting sequences up to 2,097,151 bases (POSITION_MASK=(1L<<21)-1)
Deletion Tracking: DEL_BITS=21 with DEL_MASK=((1L<<21)-1)<<POSITION_BITS for operation counting
Score Storage: SCORE_SHIFT=42 with remaining 22 bits for alignment scores
Bit Field Layout: [Score: bits 42-63][Deletions: bits 21-41][Position: bits 0-20]

Performance Optimization Features

Branchless Scoring: Conditional expressions like maxValue=(maxDiagUp&SCORE_MASK)>=leftScore for CPU pipeline optimization
Loop-Free Position Extension: LOOP_VERSION=false uses unrolled position extension with rightExtend=max(5, bandwidth-2)
Dense Top Processing: Optional DENSE_TOP mode for first topWidth rows with full matrix computation
Memory Management: 200MB default allocation via parseJavaArgs("--mem=200m", "--mode=fixed")

Identity Calculation Process

Operation Count Recovery: Extracts originPos=(int)(maxScore&POSITION_MASK) and deletions from bit fields
Mathematical Solving: insertions=max(0, qLen+deletions-refAlnLength), matches=(rawScore+qLen+deletions)/2f
Final Identity: matches/(matches+substitutions+insertions+deletions) with complete operation accounting
Coordinate Extraction: Reference start/stop positions from packed bit fields without traceback

Visualization Integration

Visualizer Object: Created when output!=null for matrix exploration pattern recording
State Space Mapping: Records sparse matrix exploration patterns for visualizealignment.sh processing
Debug Support: Optional PRINT_OPS=false flag for detailed operation count debugging and validation
Performance Monitoring: AtomicLong loops counter tracking computational work via loops.addAndGet(mloops)

Technical Implementation

Sequence Processing

Sequences can contain any characters, with special handling for N bases using hasN=((q|r)>=15) logic
Automatic sequence swapping when posVector==null && query.length>ref.length for memory optimization
Maximum reference sequence length is 2,097,151 bases due to 21-bit position encoding limitation
Length validation: assert(ref.length<=POSITION_MASK) preventing overflow in bit-packed operations

Memory Architecture

Primary arrays: Two long[rLen+1] arrays (prev and curr) = 16 * (rLen+1) bytes
Active tracking: IntList activeList and nextList with capacity rLen+4 for sparse position management
JVM allocation: 200MB via parseJavaArgs("--mem=200m", "--mode=fixed") for matrix computation
Space complexity: O(rLen) rather than O(qLen * rLen) of traditional dynamic programming
Visualizer overhead: Optional Visualizer object allocation when output file specified

Performance Characteristics

Time complexity: O(qLen * activePositions) where activePositions << rLen due to sparse optimization
Bandwidth-limited processing: activeList.size-1 positions per row instead of full rLen processing
Bridge construction: Periodic addition of up to 35 positions every 16 iterations for deletion recovery
Insertion penalty: insPad=-16-rLen/128-(int)Math.sqrt(rLen)-(5*Math.max(0, rLen-qLen))/4 threshold
Loop counting: Comprehensive mloops tracking with loops.addAndGet() for performance analysis

Output Format

Identity Value: Float (0.0-1.0) calculated from complete operation counts via mathematical constraint solving
Reference Coordinates: Starting position (rStart) and ending position (rStop) extracted from bit-packed score values
Optional Statistics: Raw alignment score (bits 42-63) and deletion counts (bits 21-41) when posVector.length > 2
Global Mode Output: When GLOBAL=true, forces maxPos=rLen and maxScore=prev[rLen-1]+DEL_INCREMENT adjustment

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org