BandedAligner
Aligns a query sequence to a reference using BandedAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.
Basic Usage
bandedaligner.sh <query> <ref>
bandedaligner.sh <query> <ref> <map>
bandedaligner.sh <query> <ref> <map> <iterations> <simd>
Parameters
Parameters control the alignment process, including input/output settings, visualization options, and performance optimizations.
Required Parameters
- query
- A literal nucleotide sequence or fasta file. The query sequence to be aligned to the reference.
- ref
- A literal nucleotide sequence or fasta file. The reference sequence against which the query will be aligned.
Optional Parameters
- map
- Optional output text file for matrix score space visualization. Set to null for benchmarking with no visualization. This map can be fed to visualizealignment.sh to make an image showing the alignment process.
- iterations
- Optional integer for benchmarking multiple iterations. Useful for performance testing and timing comparisons.
- simd
- Enable SIMD (Single Instruction, Multiple Data) mode. Needs a large band to be effective. Improves performance for longer sequences when bandwidth is sufficiently large.
Examples
Basic Alignment
bandedaligner.sh ACGTACGTACGT ACGTACCTACGT
Aligns two literal sequences and outputs identity score and alignment coordinates.
File-based Alignment
bandedaligner.sh query.fasta reference.fasta
Aligns sequences from FASTA files.
Alignment with Visualization
bandedaligner.sh query.fasta reference.fasta alignment_map.txt
Performs alignment and outputs a visualization map that can be processed with visualizealignment.sh.
Performance Benchmarking
bandedaligner.sh ACGTACGTACGT ACGTACCTACGT null 1000 simd
Runs 1000 iterations with SIMD optimization enabled for benchmarking purposes.
Algorithm Details
Core Algorithm
BandedAligner implements a space-efficient banded alignment algorithm that restricts computations to a diagonal band around the expected alignment path. This approach significantly reduces memory usage and computational complexity while maintaining accuracy for high-identity alignments.
Technical Implementation
- Banded Approach: Uses adaptive bandwidth calculation based on sequence similarity to restrict the alignment search space
- Memory Efficiency: Employs only 2 arrays to avoid traceback, using bit-packed scoring to store position, deletion count, and score in a single long integer
- Exact Results: Provides exact alignment identity without approximation
- Position Tracking: Calculates rstart and rstop positions without full traceback using embedded position information
- Length Limitations: Limited to sequences up to 2Mbp due to 21-bit position encoding
Bandwidth Calculation
The algorithm automatically determines optimal bandwidth using the formula: min(60 + sqrt(refLength), 4 + max(queryLength, refLength)/8)
. For high-identity alignments with few indels, this adaptive approach ensures minimal bandwidth while maintaining accuracy.
Scoring System
- Match: +1 point for identical bases
- Mismatch: -1 point for different bases
- Insertion: -1 point for query insertions
- Deletion: -1 point for reference deletions
- N-handling: 0 points for alignments involving N characters
Performance Characteristics
- Time Complexity: O(n × bandwidth) where n is sequence length
- Space Complexity: O(reference_length) for two scoring arrays
- SIMD Optimization: Available for large bands to accelerate vector operations
- Optimal Use Case: High-identity alignments (>90%) with minimal indels
Output Format
The aligner outputs:
- Identity: Fraction of aligned bases that match (0.0-1.0)
- Reference Start: Starting position in the reference sequence
- Reference Stop: Ending position in the reference sequence
- Visualization Map: Optional matrix showing the alignment search space
Bit-field Encoding
The algorithm uses efficient bit-field encoding in 64-bit integers:
- Position Bits (21): Store reference position (supports up to 2M bases)
- Deletion Bits (21): Count deletions in alignment
- Score Bits (22): Store alignment score
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org