WobbleAligner

Basic Usage

wobblealigner.sh <query> <ref>
wobblealigner.sh <query> <ref> <map>
wobblealigner.sh <query> <ref> <map> <iterations>

WobbleAligner accepts positional arguments for query sequence, reference sequence, optional output file for matrix visualization, and optional iteration count for benchmarking.

Parameters

WobbleAligner uses positional parameters rather than named parameters. All parameters are provided as command line arguments in order.

Required Parameters

query: A literal nucleotide sequence or path to a fasta file containing the query sequence. This sequence will be aligned to the reference.
ref: A literal nucleotide sequence or path to a fasta file containing the reference sequence. The query will be aligned against this reference.

Optional Parameters

map: Optional output text file for matrix score space visualization. This file contains the state space exploration map that can be fed to visualizealignment.sh to create an image representation of the alignment process. Set to "null" for benchmarking runs with no visualization output.
iterations: Optional integer specifying the number of iterations for benchmarking multiple alignment runs. Useful for performance testing and timing measurements.

Examples

Basic Sequence Alignment

wobblealigner.sh ATCGATCGATCG ATCGATCGATCG

Aligns two identical sequences directly from command line. Returns perfect identity (1.0) with alignment coordinates.

File-based Alignment

wobblealigner.sh query.fasta reference.fasta

Aligns sequences from fasta files. The tool will read the first sequence from each file for alignment.

Alignment with Visualization

wobblealigner.sh query.fasta reference.fasta alignment_matrix.txt

Performs alignment and outputs the state space matrix to a text file for visualization. The matrix file can be processed with visualizealignment.sh.

Benchmarking Mode

wobblealigner.sh ATCGATCG ATCGATCG null 1000

Runs alignment 1000 times for performance benchmarking without generating visualization output.

Sequences with N Characters

wobblealigner.sh ATCNGATCG ATCAGATCG matrix.txt

Aligns sequences containing N characters, which receive special scoring treatment (neutral score of 0).

Algorithm Details

WobbleAligner Architecture

WobbleAligner is a traceback-free sequence alignment algorithm using 2-array dynamic programming with 64-bit bit-packed encoding. It implements RingBuffer-based adaptive banding and mathematical constraint solving for exact operation count recovery:

Memory Design

The algorithm uses only 2 arrays instead of a full dynamic programming matrix, avoiding traceback while still calculating exact alignment coordinates (rstart and rstop). This approach significantly reduces memory requirements compared to traditional alignment algorithms.

Adaptive Banding Strategy

WobbleAligner employs a RingBuffer-based adaptive banding mechanism with decideBandwidth() method that:

Dynamic bandwidth adjustment: Band width starts wide and narrows to allow glocal alignments, then dynamically widens and narrows based on sequence identity
Bandwidth calculation: Initial bandwidth is calculated as mid(7, 1+max(qLen, rLen)/24, 24+sqrt(rLen)/6) with early termination for high-identity sequences
Center drift tracking: The center of the band drifts toward the highest-scoring alignment path with controlled drift limits (maxDrift=3)
Score-based expansion: Band width increases when recent alignment quality is poor, providing additional exploration space

Scoring System

The algorithm uses fixed scoring constants with MATCH=1L<<SCORE_SHIFT, SUB=-1L<<SCORE_SHIFT, INS=-1L<<SCORE_SHIFT, DEL=-1L<<SCORE_SHIFT, N_SCORE=0L:

Match score: +1 for identical nucleotides (except N)
Substitution penalty: -1 for mismatched nucleotides
Insertion penalty: -1 for gaps in reference
Deletion penalty: -1 for gaps in query
N character handling: Score of 0 for alignments involving N, treating ambiguous bases neutrally

64-bit Bit-Packed Encoding

WobbleAligner uses 64-bit bit-packing with POSITION_BITS=21, DEL_BITS=21, SCORE_SHIFT=42 to store multiple pieces of information in a single long integer:

Position bits (21 bits): Store alignment start position (supports sequences up to 2Mbp)
Deletion bits (21 bits): Track deletion count for gap penalty calculation
Score bits (remaining bits): Store the actual alignment score

Performance Characteristics

The algorithm has the following performance characteristics:

Linear space complexity: O(reference_length) memory usage regardless of query length
Reduced time complexity: Banding limits computation to O(bandwidth × query_length) instead of O(query_length × reference_length)
Cache efficiency: Sequential access patterns and compact data structures improve cache performance
Early termination: High-identity sequences with few mismatches use minimal bandwidth

RingBuffer Implementation

The algorithm uses RingBuffer with configurable ringSize=bandWidth0 to track recent alignment scores. scoreBonus calculation uses RING_MULT*min(ringSize, recentMissingScore) for bandwidth expansion when alignment quality degrades.

Sequence Length Limitations

WobbleAligner supports sequences up to 2 Megabases (2,097,151 bp) due to the 21-bit position encoding with POSITION_MASK=(1L << POSITION_BITS)-1. Length validation uses assert(ref.length<=POSITION_MASK).

Identity Calculation

The algorithm solves a system of equations to calculate exact operation counts:

M + S + I = query_length (matches + substitutions + insertions)
M + S + D = reference_alignment_length (matches + substitutions + deletions)
Score = M - S - I - D (match bonus minus penalties)

This provides accurate identity calculation without requiring traceback through the alignment matrix.

Output Format

WobbleAligner outputs alignment results in a concise format:

Identity: Alignment identity as a decimal value (0.0 to 1.0)
Reference start (rstart): Starting position of alignment in reference sequence (0-based)
Reference stop (rstop): Ending position of alignment in reference sequence (0-based)
Matrix file (optional): State space exploration visualization for analysis with visualizealignment.sh

Use Cases

High-identity alignments: Optimized for sequences with >90% identity
Glocal alignment: Finds best local alignment within sequences
Memory-constrained environments: Minimal memory footprint for large sequences
Alignment visualization: Matrix output enables detailed alignment analysis
Performance benchmarking: Built-in iteration support for timing studies
Research applications: Exact results suitable for publication-quality analysis

Technical Notes

WobbleAligner automatically swaps query and reference to ensure query ≤ reference length when position vector is not needed
The algorithm provides exact alignment results, not heuristic approximations
N characters receive special treatment with neutral scoring (score = 0)
Global alignment mode is available but disabled by default (GLOBAL=false)
The visualization matrix can be large for long sequences; use "null" to disable for performance testing
Thread-safe implementation with atomic loop counters for concurrent usage

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org