TestAligners
Aligns a query sequence to a reference using multiple aligners. Outputs the identity, rstart and rstop positions, time, and #loops.
Basic Usage
testaligners.sh <query> <ref>
testaligners.sh <query> <ref> <iterations> <threads> <simd>
TestAligners is a benchmarking tool that compares the performance of multiple alignment algorithms on a given query-reference pair. It tests various aligner implementations and reports their accuracy, speed, and computational efficiency.
Parameters
Parameters are positional arguments that control the alignment benchmarking process.
Parameters
- query
- A literal nucleotide sequence or fasta file. Can be either a DNA/RNA sequence string (e.g., "ATCG...") or a path to a FASTA file containing the query sequence.
- ref
- A literal nucleotide sequence or fasta file. Can be either a DNA/RNA sequence string (e.g., "ATCG...") or a path to a FASTA file containing the reference sequence.
- iterations
- Optional integer for benchmarking multiple iterations. Default is 400 for sequences under 500bp, with automatic adjustment for longer sequences. Controls how many alignment iterations are performed for timing measurements.
- threads
- Number of parallel instances to use. Default is 1. When greater than 1, enables multi-threaded benchmarking for performance testing under concurrent conditions.
- simd
- Enable SIMD operations; requires AVX-256 and Java 17+. Boolean flag that enables Single Instruction Multiple Data optimizations for supported alignment algorithms.
Examples
Basic Sequence Alignment
testaligners.sh "ATCGATCGATCG" "ATCGATCGATCG"
Tests all available aligners on two identical 12bp sequences, showing perfect identity scores.
FASTA File Input
testaligners.sh query.fasta reference.fasta
Aligns sequences from FASTA files using all available alignment algorithms.
Benchmarking with Multiple Iterations
testaligners.sh "ATCGATCGATCG" "ATCGATCGATCG" 1000
Performs 1000 alignment iterations for more precise timing measurements.
Multi-threaded Benchmarking
testaligners.sh query.fasta ref.fasta 500 4
Tests aligners using 4 parallel threads with 500 iterations per thread.
SIMD-Enabled Performance Testing
testaligners.sh query.fasta ref.fasta 1000 1 simd
Enables SIMD optimizations (requires Java 17+ and AVX-256 support) for maximum performance testing.
Algorithm Details
Tested Alignment Algorithms
TestAligners benchmarks 8 different alignment algorithm implementations available in the aligner package, each using distinct computational approaches:
- GlocalAligner: Traceback-free global alignment using 64-bit bit-packing (21-bit position + 21-bit deletion + 22-bit score), with reduced iterations for sequences ≥500bp
- BandedAligner: Band-limited dynamic programming implementation
- DriftingAligner: Alignment with adaptive banding capabilities
- WobbleAligner: Alignment algorithm variant for indel handling
- QuantumAligner: Research alignment algorithm implementation
- QuabbleAligner: Alternative alignment algorithm implementation
- XDropHAligner: X-drop extension-based alignment algorithm
- WaveFrontAligner2: Wavefront-based alignment algorithm implementation
Performance Metrics
For each aligner, TestAligners reports performance metrics extracted from the alignment process:
- ANI (Average Nucleotide Identity): Alignment accuracy as a fraction (0.0-1.0)
- rStart/rStop: Alignment coordinates on the reference sequence
- Loops: Number of computational loops performed (efficiency metric)
- Space%: Percentage of dynamic programming matrix explored
- Time: Wall-clock execution time in seconds
Adaptive Benchmarking
The tool automatically adjusts iteration counts based on sequence length to provide meaningful timing data while avoiding excessive computation:
- Short sequences (<500bp): Full iteration count for precise timing
- Long sequences (≥500bp): Reduced iterations to prevent excessive runtime
- Multi-threaded mode: Balanced workload distribution across threads
Validation Framework
TestAligners includes validation using assertion-based test cases in the validate() method:
- Perfect matches: "A" vs "A" → identity = 1.0
- Complete mismatches: "T" vs "A" → identity = 0.0
- Indel handling: "AGA" vs "AA" → identity = 0.667
- Complex alignments with multiple gaps and mismatches
SIMD Support
SIMD functionality is controlled by the Shared.SIMD boolean flag. When enabled via command line parameter, this activates vector processing capabilities where implemented. Requirements:
- Java 17 or later with Vector API support
- CPU with AVX-256 instruction set
- Properly configured JVM with vector intrinsics enabled
Multi-threading Implementation
Multi-threaded execution uses ExecutorService thread pool with createNewInstance() reflection to create independent aligner copies per thread. AtomicLong iteration counter coordinates work distribution with getAndIncrement() for concurrent access. Each thread creates local sequence copies using Arrays.copyOf() to prevent data races.
Output Format
TestAligners produces tabular output with alignment results for each algorithm:
Name ANI rStart rStop Loops Space% Time GlocalAlign 1.0000 0 12 156 100.000 0.001 BandedAlign 1.0000 0 12 156 45.830 0.001
Output columns explained:
- Name: Alignment algorithm identifier
- ANI: Average Nucleotide Identity (0.0-1.0)
- rStart: Start position on reference sequence
- rStop: End position on reference sequence
- Loops: Computational cycles per iteration
- Space%: Percentage of alignment matrix explored
- Time: Execution time in seconds
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org