WobbleAligner
Aligns a query sequence to a reference using WobbleAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.
Basic Usage
wobblealigner.sh <query> <ref>
wobblealigner.sh <query> <ref> <map>
wobblealigner.sh <query> <ref> <map> <iterations>
WobbleAligner accepts positional arguments for query sequence, reference sequence, optional output file for matrix visualization, and optional iteration count for benchmarking.
Parameters
WobbleAligner uses positional parameters rather than named parameters. All parameters are provided as command line arguments in order.
Required Parameters
- query
- A literal nucleotide sequence or path to a fasta file containing the query sequence. This sequence will be aligned to the reference.
- ref
- A literal nucleotide sequence or path to a fasta file containing the reference sequence. The query will be aligned against this reference.
Optional Parameters
- map
- Optional output text file for matrix score space visualization. This file contains the state space exploration map that can be fed to visualizealignment.sh to create an image representation of the alignment process. Set to "null" for benchmarking runs with no visualization output.
- iterations
- Optional integer specifying the number of iterations for benchmarking multiple alignment runs. Useful for performance testing and timing measurements.
Examples
Basic Sequence Alignment
wobblealigner.sh ATCGATCGATCG ATCGATCGATCG
Aligns two identical sequences directly from command line. Returns perfect identity (1.0) with alignment coordinates.
File-based Alignment
wobblealigner.sh query.fasta reference.fasta
Aligns sequences from fasta files. The tool will read the first sequence from each file for alignment.
Alignment with Visualization
wobblealigner.sh query.fasta reference.fasta alignment_matrix.txt
Performs alignment and outputs the state space matrix to a text file for visualization. The matrix file can be processed with visualizealignment.sh.
Benchmarking Mode
wobblealigner.sh ATCGATCG ATCGATCG null 1000
Runs alignment 1000 times for performance benchmarking without generating visualization output.
Sequences with N Characters
wobblealigner.sh ATCNGATCG ATCAGATCG matrix.txt
Aligns sequences containing N characters, which receive special scoring treatment (neutral score of 0).
Algorithm Details
WobbleAligner Architecture
WobbleAligner is a traceback-free sequence alignment algorithm using 2-array dynamic programming with 64-bit bit-packed encoding. It implements RingBuffer-based adaptive banding and mathematical constraint solving for exact operation count recovery:
Memory Design
The algorithm uses only 2 arrays instead of a full dynamic programming matrix, avoiding traceback while still calculating exact alignment coordinates (rstart and rstop). This approach significantly reduces memory requirements compared to traditional alignment algorithms.
Adaptive Banding Strategy
WobbleAligner employs a RingBuffer-based adaptive banding mechanism with decideBandwidth() method that:
- Dynamic bandwidth adjustment: Band width starts wide and narrows to allow glocal alignments, then dynamically widens and narrows based on sequence identity
- Bandwidth calculation: Initial bandwidth is calculated as mid(7, 1+max(qLen, rLen)/24, 24+sqrt(rLen)/6) with early termination for high-identity sequences
- Center drift tracking: The center of the band drifts toward the highest-scoring alignment path with controlled drift limits (maxDrift=3)
- Score-based expansion: Band width increases when recent alignment quality is poor, providing additional exploration space
Scoring System
The algorithm uses fixed scoring constants with MATCH=1L<<SCORE_SHIFT, SUB=-1L<<SCORE_SHIFT, INS=-1L<<SCORE_SHIFT, DEL=-1L<<SCORE_SHIFT, N_SCORE=0L:
- Match score: +1 for identical nucleotides (except N)
- Substitution penalty: -1 for mismatched nucleotides
- Insertion penalty: -1 for gaps in reference
- Deletion penalty: -1 for gaps in query
- N character handling: Score of 0 for alignments involving N, treating ambiguous bases neutrally
64-bit Bit-Packed Encoding
WobbleAligner uses 64-bit bit-packing with POSITION_BITS=21, DEL_BITS=21, SCORE_SHIFT=42 to store multiple pieces of information in a single long integer:
- Position bits (21 bits): Store alignment start position (supports sequences up to 2Mbp)
- Deletion bits (21 bits): Track deletion count for gap penalty calculation
- Score bits (remaining bits): Store the actual alignment score
Performance Characteristics
The algorithm has the following performance characteristics:
- Linear space complexity: O(reference_length) memory usage regardless of query length
- Reduced time complexity: Banding limits computation to O(bandwidth × query_length) instead of O(query_length × reference_length)
- Cache efficiency: Sequential access patterns and compact data structures improve cache performance
- Early termination: High-identity sequences with few mismatches use minimal bandwidth
RingBuffer Implementation
The algorithm uses RingBuffer with configurable ringSize=bandWidth0 to track recent alignment scores. scoreBonus calculation uses RING_MULT*min(ringSize, recentMissingScore) for bandwidth expansion when alignment quality degrades.
Sequence Length Limitations
WobbleAligner supports sequences up to 2 Megabases (2,097,151 bp) due to the 21-bit position encoding with POSITION_MASK=(1L << POSITION_BITS)-1. Length validation uses assert(ref.length<=POSITION_MASK).
Identity Calculation
The algorithm solves a system of equations to calculate exact operation counts:
- M + S + I = query_length (matches + substitutions + insertions)
- M + S + D = reference_alignment_length (matches + substitutions + deletions)
- Score = M - S - I - D (match bonus minus penalties)
This provides accurate identity calculation without requiring traceback through the alignment matrix.
Output Format
WobbleAligner outputs alignment results in a concise format:
- Identity: Alignment identity as a decimal value (0.0 to 1.0)
- Reference start (rstart): Starting position of alignment in reference sequence (0-based)
- Reference stop (rstop): Ending position of alignment in reference sequence (0-based)
- Matrix file (optional): State space exploration visualization for analysis with visualizealignment.sh
Use Cases
- High-identity alignments: Optimized for sequences with >90% identity
- Glocal alignment: Finds best local alignment within sequences
- Memory-constrained environments: Minimal memory footprint for large sequences
- Alignment visualization: Matrix output enables detailed alignment analysis
- Performance benchmarking: Built-in iteration support for timing studies
- Research applications: Exact results suitable for publication-quality analysis
Technical Notes
- WobbleAligner automatically swaps query and reference to ensure query ≤ reference length when position vector is not needed
- The algorithm provides exact alignment results, not heuristic approximations
- N characters receive special treatment with neutral scoring (score = 0)
- Global alignment mode is available but disabled by default (GLOBAL=false)
- The visualization matrix can be large for long sequences; use "null" to disable for performance testing
- Thread-safe implementation with atomic loop counters for concurrent usage
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org