XDropHAligner
Aligns a query sequence to a reference using the XDropHAligner algorithm. The sequences can be any characters, but N is handled as a special case. Outputs the identity percentage, reference start position (rstart), and reference stop position (rstop). Optionally prints a state space exploration map that can be fed to visualizealignment.sh to create an image. This aligner uses only 2 arrays and avoids traceback while giving an exact answer. It calculates rstart and rstop positions without traceback and is limited to sequences up to 2Mbp in length due to 21 position bits.
Basic Usage
xdrophaligner.sh <query> <ref>
xdrophaligner.sh <query> <ref> <map>
xdrophaligner.sh <query> <ref> <map> <iterations>
XDropHAligner performs pairwise sequence alignment between a query and reference sequence. The tool accepts both literal nucleotide sequences and fasta files as input.
Parameters
Parameters control input sequences, visualization output, and benchmarking iterations.
Standard Parameters
- query
- A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference.
- ref
- A literal nucleotide sequence or fasta file. This is the reference sequence that the query will be aligned to.
- map
- Optional output text file for matrix score space. Set to "null" for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an image of the alignment state space exploration.
- iterations
- Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.
Java Parameters
- -Xmx
- Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Output Format
Standard Output
The aligner outputs the following information to standard output:
- Identity: Percentage identity between query and reference (0.0-1.0 scale)
- rstart: Starting position in the reference sequence where optimal alignment begins
- rstop: Ending position in the reference sequence where optimal alignment ends
Map File Format
When a map file is specified, the tool outputs a state space exploration map showing the matrix scores during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.
Examples
Basic Alignment
xdrophaligner.sh ACGTACGTACGT ACGTACGTACGT
Align two literal sequences and output identity and position information.
Alignment with Fasta Files
xdrophaligner.sh query.fa reference.fa
Align sequences from fasta files.
Generate Visualization Map
xdrophaligner.sh query.fa ref.fa alignment_map.txt
Perform alignment and save the state space exploration map to a file for later visualization.
Benchmarking Mode
xdrophaligner.sh query.fa ref.fa null 1000
Run 1000 iterations without generating visualization output for performance benchmarking.
Complete Workflow with Visualization
xdrophaligner.sh query.fa ref.fa map.txt
visualizealignment.sh map.txt alignment_image.png
First generate the alignment map, then use visualizealignment.sh to create a graphical representation.
Algorithm Details
XDropH Algorithm
XDropHAligner implements an alignment algorithm with the following characteristics:
- Two-Array Design: Uses only current and previous row arrays, minimizing memory footprint
- No Traceback: Calculates alignment coordinates without storing the full traceback matrix
- Exact Results: Despite optimizations, produces exact alignment scores and positions
- Sparse Matrix: Tracks active positions using IntLists to avoid computing cells unlikely to be optimal
- X-Drop Pruning: Automatically terminates exploration when scores drop too far from the current maximum
Bit Field Storage
The algorithm uses a bit-packing scheme to store multiple values in a single long integer:
- Position Bits (21 bits): Stores reference starting position (supports sequences up to 2Mbp)
- Deletion Bits (21 bits): Tracks deletion count for alignment reconstruction
- Score Bits (22 bits): Stores alignment score in upper bits
This compact representation enables efficient storage and computation without separate data structures.
Scoring System
- Match: +1 score increment
- Substitution: -1 score penalty
- Insertion: -1 score penalty
- Deletion: -1 score penalty
- N Character: 0 score (neutral, special case handling)
Bandwidth Calculation
The algorithm uses adaptive bandwidth based on sequence characteristics:
bandwidth = min(qLen/4+2, max(qLen,rLen)/32, 12)
bandwidth = max(2, bandwidth) + 3
This heuristic adjusts search space based on sequence lengths, balancing accuracy and performance.
Memory Requirements
Memory usage is low due to the two-array design:
- Fixed Allocation: 200MB default (--mem=200m --mode=fixed)
- Arrays: Two long arrays of length (reference_length + 1)
- Active Lists: Two IntLists for tracking active positions
- No Traceback Matrix: Avoids O(query_length × reference_length) memory typical of dynamic programming
Performance Characteristics
- Time Complexity: O(query_length × active_bandwidth) due to sparse exploration
- Space Complexity: O(reference_length) due to two-array design
- Sequence Limit: Maximum 2,097,151 bp (2^21 - 1) due to 21-bit position encoding
- Branchless Computation: Uses conditional expressions instead of branches for better CPU pipeline efficiency
Special Features
- N-Character Handling: Treats N as neutral (no penalty, no reward), useful for ambiguous bases
- Visualizer Integration: Optional Visualizer class generates state space maps for debugging and analysis
- Identity Calculation: Solves system of equations to determine matches, substitutions, insertions, and deletions
- Position Extraction: Returns reference alignment coordinates without traceback using bit field storage
IDAligner Interface
XDropHAligner implements the IDAligner interface, making it interchangeable with other alignment algorithms in the BBTools suite. This allows systematic benchmarking and comparison of different alignment strategies.
Support
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org