TestAligners2
Tests multiple aligners using random sequences. The sequences have variable pairwise ANI, and each ANI level is tested multiple times for average accuracy and loop count. Outputs the identity, rstart and rstop positions, time, and #loops. Note that the 'design' ANI is approximate and will not match the measured ANI.
Basic Usage
testaligners2.sh iterations=30 maxani=100 minani=90 step=2
This will test multiple alignment algorithms across ANI values from 100% to 90% in steps of 2%, running 30 iterations at each ANI level to get averaged performance statistics.
Parameters
All parameters control the sequence generation, testing conditions, and performance measurement for the alignment algorithm comparison.
Parameters
- length=40k
- Length of random sequences to generate for testing. Default: 40,000 bases. Accepts standard suffixes (k=1000, m=1,000,000).
- iterations=32
- Number of iterations to average for each ANI level; higher values provide more accurate performance measurements but take longer to run. Default: 32.
- maxani=80
- Maximum ANI (Average Nucleotide Identity) percentage to test. The tool will start testing at this ANI level and decrease by step size. Default: 80%. Can be specified as decimal (0.8) or percentage (80).
- minani=30
- Minimum ANI percentage to test. The tool will stop testing when it reaches this ANI level. Default: 30%. Can be specified as decimal (0.3) or percentage (30).
- step=2
- ANI step size for decreasing from maxani to minani. Default: 2%. Can be specified as decimal (0.02) or percentage (2).
- sinewaves=0
- Number of sinewaves to model variable conservation patterns along the sequence. 0 means uniform mutation rate, values >0 create regions of varying conservation. Default: 0 (uniform).
- threads=
- Number of parallel alignment threads to use. Default: uses all logical CPU cores. Higher values speed up testing but may saturate system resources.
- simd
- Enable SIMD (Single Instruction, Multiple Data) vectorized operations for faster alignment computation. Requires AVX-256 instruction support and Java 17 or higher.
Examples
Basic Performance Comparison
testaligners2.sh iterations=30 maxani=100 minani=90 step=2
Tests aligners from 100% to 90% ANI in 2% steps with 30 iterations per ANI level.
High-Resolution Analysis
testaligners2.sh iterations=50 maxani=95 minani=80 step=1 length=100k
Performs detailed analysis with 1% ANI steps, 50 iterations per level, and longer 100kb sequences for higher resolution results.
Conservation Model Testing
testaligners2.sh sinewaves=3 iterations=25 maxani=90 minani=70
Tests aligners with variable conservation patterns (3 sinewaves) that simulate realistic genomic variation rather than uniform mutation.
Fast Multi-threaded Run
testaligners2.sh threads=16 simd iterations=20 maxani=85 minani=75 step=5
Quick performance comparison using 16 threads, SIMD acceleration, and larger step sizes for faster results.
Algorithm Details
Alignment Algorithm Testing Framework
TestAlignerSuite evaluates the performance of multiple sequence alignment algorithms by testing them against sequences with known ANI relationships. The framework generates pairs of sequences with controlled similarity levels and measures alignment accuracy and performance.
Tested Alignment Algorithms
The tool currently benchmarks nine different alignment algorithms:
- GlocalPlusAligner5 - Enhanced glocal (global-local) alignment algorithm
- BandedAligner - Banded dynamic programming alignment within constrained search space
- DriftingAligner - Adaptive alignment with drift compensation
- WobbleAligner - Flexible alignment handling sequence variations
- ScrabbleAligner - Alignment algorithm with specialized heuristics
- QuantumAligner - Quantum-inspired alignment algorithm
- QuabbleAligner - Specialized alignment for divergent sequences
- XDropHAligner - X-drop alignment with horizontal optimization
- WaveFrontAligner2 - Wavefront-based alignment algorithm
Sequence Generation and Mutation
The testing framework uses a multi-component mutation model implementing controlled evolutionary changes:
- Base Generation: Creates random reference sequences of specified length
- Mutation Distribution: Uses 75% substitutions, 12.5% deletions, 12.5% insertions to model realistic evolutionary changes
- Indel Modeling: Supports indels up to 9 bases with geometric length distribution
- Conservation Patterns: Optional sinewave modeling creates regions of variable conservation to simulate real genomic structure
- Homopolymer Avoidance: Prevents artificial homopolymer runs that could bias alignment results
Performance Metrics
For each aligner and ANI level, the framework measures:
- Alignment Identity: Actual ANI achieved by the alignment
- Reference Coordinates: Start and stop positions in reference sequence
- Loop Count: Number of computational loops (algorithm efficiency)
- State Space Utilization: Percentage of theoretical alignment matrix explored
- Runtime: Wall-clock time for alignment computation
Multithreaded Architecture
The framework implements a job queue-based threading model using ConcurrentLinkedQueue and AtomicLong counters:
- Job Queue: ConcurrentLinkedQueue distributes work across threads
- Worker Threads: Each thread processes independent alignment jobs
- Result Aggregation: Thread-safe accumulation of performance statistics
- Loop Counting: Captures total loop counts before and after parallel execution
Statistical Analysis
Results are averaged across all iterations for each ANI level using double-precision accumulators. The framework reports both absolute metrics (time, loops) and relative efficiency measures (state space utilization percentage).
SIMD Acceleration
When enabled, SIMD operations use AVX-256 vectorized instructions to process multiple alignment operations simultaneously, providing 2-4x performance improvements on supported hardware (requires Java 17+).
Output Format
The tool outputs tab-delimited performance statistics for each aligner:
Aligner ANI rStart rStop Loops Space% Time
BandedAlign 85.2341 45 39955 1234567 78.9 0.123
- Aligner: Name of the alignment algorithm
- ANI: Measured average nucleotide identity (4 decimal places)
- rStart/rStop: Average reference sequence start/stop coordinates
- Loops: Average computational loops per alignment
- Space%: Percentage of theoretical state space explored
- Time: Average runtime in seconds (3 decimal places)
Performance Considerations
- Memory Usage: Scales with sequence length and number of threads. Longer sequences require more memory for alignment matrices.
- Runtime: Testing time increases with iterations × (maxani-minani)/step × number of aligners. Consider reducing parameters for quick tests.
- Thread Scaling: Performance typically scales well up to the number of logical CPU cores, with diminishing returns beyond that point.
- SIMD Benefits: SIMD acceleration provides 2-4x performance improvements on compatible hardware but requires Java 17+ and AVX-256 support.
- ANI Range Selection: Higher ANI values (>95%) may show less performance differentiation between aligners compared to challenging low-ANI scenarios.
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org