TestAligners2

Script: testaligners2.sh Package: aligner Class: TestAlignerSuite.java

Tests multiple aligners using random sequences. The sequences have variable pairwise ANI, and each ANI level is tested multiple times for average accuracy and loop count. Outputs the identity, rstart and rstop positions, time, and #loops. Note that the 'design' ANI is approximate and will not match the measured ANI.

Basic Usage

testaligners2.sh iterations=30 maxani=100 minani=90 step=2

This will test multiple alignment algorithms across ANI values from 100% to 90% in steps of 2%, running 30 iterations at each ANI level to get averaged performance statistics.

Parameters

All parameters control the sequence generation, testing conditions, and performance measurement for the alignment algorithm comparison.

Parameters

length=40k: Length of random sequences to generate for testing. Default: 40,000 bases. Accepts standard suffixes (k=1000, m=1,000,000).
iterations=32: Number of iterations to average for each ANI level; higher values provide more accurate performance measurements but take longer to run. Default: 32.
maxani=80: Maximum ANI (Average Nucleotide Identity) percentage to test. The tool will start testing at this ANI level and decrease by step size. Default: 80%. Can be specified as decimal (0.8) or percentage (80).
minani=30: Minimum ANI percentage to test. The tool will stop testing when it reaches this ANI level. Default: 30%. Can be specified as decimal (0.3) or percentage (30).
step=2: ANI step size for decreasing from maxani to minani. Default: 2%. Can be specified as decimal (0.02) or percentage (2).
sinewaves=0: Number of sinewaves to model variable conservation patterns along the sequence. 0 means uniform mutation rate, values >0 create regions of varying conservation. Default: 0 (uniform).
threads=: Number of parallel alignment threads to use. Default: uses all logical CPU cores. Higher values speed up testing but may saturate system resources.
simd: Enable SIMD (Single Instruction, Multiple Data) vectorized operations for faster alignment computation. Requires AVX-256 instruction support and Java 17 or higher.

Examples

Basic Performance Comparison

testaligners2.sh iterations=30 maxani=100 minani=90 step=2

Tests aligners from 100% to 90% ANI in 2% steps with 30 iterations per ANI level.

High-Resolution Analysis

testaligners2.sh iterations=50 maxani=95 minani=80 step=1 length=100k

Performs detailed analysis with 1% ANI steps, 50 iterations per level, and longer 100kb sequences for higher resolution results.

Conservation Model Testing

testaligners2.sh sinewaves=3 iterations=25 maxani=90 minani=70

Tests aligners with variable conservation patterns (3 sinewaves) that simulate realistic genomic variation rather than uniform mutation.

Fast Multi-threaded Run

testaligners2.sh threads=16 simd iterations=20 maxani=85 minani=75 step=5

Quick performance comparison using 16 threads, SIMD acceleration, and larger step sizes for faster results.

Algorithm Details

Alignment Algorithm Testing Framework

TestAlignerSuite evaluates the performance of multiple sequence alignment algorithms by testing them against sequences with known ANI relationships. The framework generates pairs of sequences with controlled similarity levels and measures alignment accuracy and performance.

Tested Alignment Algorithms

The tool currently benchmarks nine different alignment algorithms:

GlocalPlusAligner5 - Enhanced glocal (global-local) alignment algorithm
BandedAligner - Banded dynamic programming alignment within constrained search space
DriftingAligner - Adaptive alignment with drift compensation
WobbleAligner - Flexible alignment handling sequence variations
ScrabbleAligner - Alignment algorithm with specialized heuristics
QuantumAligner - Quantum-inspired alignment algorithm
QuabbleAligner - Specialized alignment for divergent sequences
XDropHAligner - X-drop alignment with horizontal optimization
WaveFrontAligner2 - Wavefront-based alignment algorithm

Sequence Generation and Mutation

The testing framework uses a multi-component mutation model implementing controlled evolutionary changes:

Base Generation: Creates random reference sequences of specified length
Mutation Distribution: Uses 75% substitutions, 12.5% deletions, 12.5% insertions to model realistic evolutionary changes
Indel Modeling: Supports indels up to 9 bases with geometric length distribution
Conservation Patterns: Optional sinewave modeling creates regions of variable conservation to simulate real genomic structure
Homopolymer Avoidance: Prevents artificial homopolymer runs that could bias alignment results

Performance Metrics

For each aligner and ANI level, the framework measures:

Alignment Identity: Actual ANI achieved by the alignment
Reference Coordinates: Start and stop positions in reference sequence
Loop Count: Number of computational loops (algorithm efficiency)
State Space Utilization: Percentage of theoretical alignment matrix explored
Runtime: Wall-clock time for alignment computation

Multithreaded Architecture

The framework implements a job queue-based threading model using ConcurrentLinkedQueue and AtomicLong counters:

Job Queue: ConcurrentLinkedQueue distributes work across threads
Worker Threads: Each thread processes independent alignment jobs
Result Aggregation: Thread-safe accumulation of performance statistics
Loop Counting: Captures total loop counts before and after parallel execution

Statistical Analysis

Results are averaged across all iterations for each ANI level using double-precision accumulators. The framework reports both absolute metrics (time, loops) and relative efficiency measures (state space utilization percentage).

SIMD Acceleration

When enabled, SIMD operations use AVX-256 vectorized instructions to process multiple alignment operations simultaneously, providing 2-4x performance improvements on supported hardware (requires Java 17+).

Output Format

The tool outputs tab-delimited performance statistics for each aligner:

Aligner    ANI      rStart  rStop   Loops    Space%   Time
BandedAlign  85.2341  45      39955   1234567  78.9     0.123

Aligner: Name of the alignment algorithm
ANI: Measured average nucleotide identity (4 decimal places)
rStart/rStop: Average reference sequence start/stop coordinates
Loops: Average computational loops per alignment
Space%: Percentage of theoretical state space explored
Time: Average runtime in seconds (3 decimal places)

Performance Considerations

Memory Usage: Scales with sequence length and number of threads. Longer sequences require more memory for alignment matrices.
Runtime: Testing time increases with iterations × (maxani-minani)/step × number of aligners. Consider reducing parameters for quick tests.
Thread Scaling: Performance typically scales well up to the number of logical CPU cores, with diminishing returns beyond that point.
SIMD Benefits: SIMD acceleration provides 2-4x performance improvements on compatible hardware but requires Java 17+ and AVX-256 support.
ANI Range Selection: Higher ANI values (>95%) may show less performance differentiation between aligners compared to challenging low-ANI scenarios.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org