QuabbleAligner
Aligns a query sequence to a reference using QuabbleAligner. Uses only 2 arrays and avoids traceback while giving an exact answer. Calculates rstart and rstop without traceback. Limited to length 2Mbp with 21 position bits.
Basic Usage
quabblealigner.sh <query> <ref>
quabblealigner.sh <query> <ref> <map>
quabblealigner.sh <query> <ref> <map> <iterations>
Parameters
QuabbleAligner accepts the following positional parameters for sequence alignment:
Parameters
- query
- A literal nucleotide sequence or fasta file. The query sequence to be aligned against the reference. Can contain any characters, but 'N' is treated as a special ambiguous nucleotide case.
- ref
- A literal nucleotide sequence or fasta file. The reference sequence that the query will be aligned to. Maximum length is limited to 2Mbp (2,097,152 bp) due to 21-bit position encoding.
- map
- Optional output text file for matrix score space visualization. Set to "null" for benchmarking with no visualization output. This file can be fed to visualizealignment.sh to create an image representation of the alignment state space exploration.
- iterations
- Optional integer for benchmarking multiple iterations of the same alignment. Used for performance testing to measure alignment speed and consistency across multiple runs.
Examples
Basic Sequence Alignment
quabblealigner.sh ATCGATCG ATCGATCGATCG
Aligns the query sequence "ATCGATCG" to the reference "ATCGATCGATCG" and outputs identity percentage along with reference start and stop positions.
File-based Alignment
quabblealigner.sh query.fasta ref.fasta
Aligns sequences from FASTA files, reading the query from query.fasta and reference from ref.fasta.
Alignment with Visualization
quabblealigner.sh ATCGATCG ATCGATCGATCG alignment_map.txt
Performs alignment and outputs the state space exploration matrix to alignment_map.txt, which can be visualized using visualizealignment.sh.
Benchmarking Mode
quabblealigner.sh ATCGATCG ATCGATCGATCG null 1000
Runs the alignment 1000 times for benchmarking purposes without generating visualization output (map set to null).
Algorithm Details
Core Algorithm
QuabbleAligner implements traceback-free pairwise sequence alignment using dual rolling arrays prev[rLen+1] and curr[rLen+1] with O(n) space complexity. The algorithm calculates Average Nucleotide Identity (ANI) through mathematical constraint solving without storing traceback matrices, using bit-packed encoding of position, deletion count, and score information.
Key Features
- Memory Efficiency: Uses only 2 arrays (current and previous row) instead of a full dynamic programming matrix
- Exact Results: Provides precise alignment scores and positions without approximation
- No Traceback: Calculates reference start (rstart) and stop (rstop) positions without traditional traceback
- Sparse Processing: Uses adaptive bandwidth and sparse matrix exploration to skip unnecessary calculations
- Position Encoding: Limited to 2Mbp sequences using 21-bit position encoding
Scoring System
The algorithm uses a bit-packed scoring system with the following components:
- Match: +1 score for identical nucleotides (excluding 'N')
- Mismatch: -1 score for different nucleotides
- Insertion: -1 score for gaps in the reference
- Deletion: -1 score for gaps in the query
- N-scores: 0 score when either sequence contains 'N' (ambiguous nucleotide)
Adaptive Bandwidth
QuabbleAligner automatically determines the alignment bandwidth based on sequence characteristics:
- Bandwidth = min(qLen/4+2, maxLen/32, log2(maxLen+256)+2)
- Minimum bandwidth of 2, with additional padding of +2
- Early substitution counting to optimize bandwidth for high-identity alignments
Sparse Matrix Strategy
The algorithm employs IntList-based sparse processing for memory efficiency:
- Dense Top Band: Optional dense processing for the first few rows controlled by DENSE_TOP boolean flag
- Active Lists: IntList activeList and nextList structures skip empty matrix regions by tracking only populated positions
- Bridge Building: BUILD_BRIDGES feature enables long deletion handling using maxScore+DEL_INCREMENT*(j-maxPos) calculation for gap traversal
- Score-based Pruning: Eliminates positions using scoreDif=prevRowScore-maxValue threshold with scoreWidth boundary conditions
Bit Field Organization
Each score entry contains packed information in a 64-bit long:
- Position bits (21): Reference starting position (bits 0-20)
- Deletion bits (21): Deletion count (bits 21-41)
- Score bits (22): Alignment score (bits 42-63)
Performance Characteristics
- Time Complexity: O(nm) worst case, but typically much faster due to sparse processing
- Space Complexity: O(m) where m is reference length
- Memory Usage: Approximately 200MB default memory allocation
- Sequence Limits: Maximum reference length of 2,097,152 bp (2^21 - 1)
Output Information
QuabbleAligner returns detailed alignment statistics:
- Identity: Fraction of matching positions (0.0-1.0)
- Reference Start: Starting position in the reference sequence
- Reference Stop: Ending position in the reference sequence
- Raw Score: Total alignment score
- Operations: Counts of matches, substitutions, insertions, and deletions
Special Handling
- N nucleotides: Treated as neutral (no penalty or bonus)
- Sequence swapping: Automatically swaps sequences if query is longer than reference (when position vector is not needed)
- Global alignment mode: Optional global alignment forcing (GLOBAL flag)
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org