XDropHAligner

Script: xdrophaligner.sh Package: aligner Class: XDropHAligner.java

Aligns a query sequence to a reference using the XDropHAligner algorithm. The sequences can be any characters, but N is handled as a special case. Outputs the identity percentage, reference start position (rstart), and reference stop position (rstop). Optionally prints a state space exploration map that can be fed to visualizealignment.sh to create an image. This aligner uses only 2 arrays and avoids traceback while giving an exact answer. It calculates rstart and rstop positions without traceback and is limited to sequences up to 2Mbp in length due to 21 position bits.

Basic Usage

xdrophaligner.sh <query> <ref>
xdrophaligner.sh <query> <ref> <map>
xdrophaligner.sh <query> <ref> <map> <iterations>

XDropHAligner performs pairwise sequence alignment between a query and reference sequence. The tool accepts both literal nucleotide sequences and fasta files as input.

Parameters

Parameters control input sequences, visualization output, and benchmarking iterations.

Standard Parameters

query: A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference.
ref: A literal nucleotide sequence or fasta file. This is the reference sequence that the query will be aligned to.
map: Optional output text file for matrix score space. Set to "null" for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an image of the alignment state space exploration.
iterations: Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.

Java Parameters

-Xmx: Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Output Format

Standard Output

The aligner outputs the following information to standard output:

Identity: Percentage identity between query and reference (0.0-1.0 scale)
rstart: Starting position in the reference sequence where optimal alignment begins
rstop: Ending position in the reference sequence where optimal alignment ends

Map File Format

When a map file is specified, the tool outputs a state space exploration map showing the matrix scores during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.

Examples

Basic Alignment

xdrophaligner.sh ACGTACGTACGT ACGTACGTACGT

Align two literal sequences and output identity and position information.

Alignment with Fasta Files

xdrophaligner.sh query.fa reference.fa

Align sequences from fasta files.

Generate Visualization Map

xdrophaligner.sh query.fa ref.fa alignment_map.txt

Perform alignment and save the state space exploration map to a file for later visualization.

Benchmarking Mode

xdrophaligner.sh query.fa ref.fa null 1000

Run 1000 iterations without generating visualization output for performance benchmarking.

Complete Workflow with Visualization

xdrophaligner.sh query.fa ref.fa map.txt
visualizealignment.sh map.txt alignment_image.png

First generate the alignment map, then use visualizealignment.sh to create a graphical representation.

Algorithm Details

XDropH Algorithm

XDropHAligner implements an alignment algorithm with the following characteristics:

Two-Array Design: Uses only current and previous row arrays, minimizing memory footprint
No Traceback: Calculates alignment coordinates without storing the full traceback matrix
Exact Results: Despite optimizations, produces exact alignment scores and positions
Sparse Matrix: Tracks active positions using IntLists to avoid computing cells unlikely to be optimal
X-Drop Pruning: Automatically terminates exploration when scores drop too far from the current maximum

Bit Field Storage

The algorithm uses a bit-packing scheme to store multiple values in a single long integer:

Position Bits (21 bits): Stores reference starting position (supports sequences up to 2Mbp)
Deletion Bits (21 bits): Tracks deletion count for alignment reconstruction
Score Bits (22 bits): Stores alignment score in upper bits

This compact representation enables efficient storage and computation without separate data structures.

Scoring System

Match: +1 score increment
Substitution: -1 score penalty
Insertion: -1 score penalty
Deletion: -1 score penalty
N Character: 0 score (neutral, special case handling)

Bandwidth Calculation

The algorithm uses adaptive bandwidth based on sequence characteristics:

bandwidth = min(qLen/4+2, max(qLen,rLen)/32, 12)
bandwidth = max(2, bandwidth) + 3

This heuristic adjusts search space based on sequence lengths, balancing accuracy and performance.

Memory Requirements

Memory usage is low due to the two-array design:

Fixed Allocation: 200MB default (--mem=200m --mode=fixed)
Arrays: Two long arrays of length (reference_length + 1)
Active Lists: Two IntLists for tracking active positions
No Traceback Matrix: Avoids O(query_length × reference_length) memory typical of dynamic programming

Performance Characteristics

Time Complexity: O(query_length × active_bandwidth) due to sparse exploration
Space Complexity: O(reference_length) due to two-array design
Sequence Limit: Maximum 2,097,151 bp (2^21 - 1) due to 21-bit position encoding
Branchless Computation: Uses conditional expressions instead of branches for better CPU pipeline efficiency

Special Features

N-Character Handling: Treats N as neutral (no penalty, no reward), useful for ambiguous bases
Visualizer Integration: Optional Visualizer class generates state space maps for debugging and analysis
Identity Calculation: Solves system of equations to determine matches, substitutions, insertions, and deletions
Position Extraction: Returns reference alignment coordinates without traceback using bit field storage

IDAligner Interface

XDropHAligner implements the IDAligner interface, making it interchangeable with other alignment algorithms in the BBTools suite. This allows systematic benchmarking and comparison of different alignment strategies.

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org