XDropHAligner

Script: xdrophaligner.sh Package: aligner Class: XDropHAligner.java

Aligns a query sequence to a reference using the XDropHAligner algorithm. The sequences can be any characters, but N is handled as a special case. Outputs the identity percentage, reference start position (rstart), and reference stop position (rstop). Optionally prints a state space exploration map that can be fed to visualizealignment.sh to create an image. This aligner uses only 2 arrays and avoids traceback while giving an exact answer. It calculates rstart and rstop positions without traceback and is limited to sequences up to 2Mbp in length due to 21 position bits.

Basic Usage

xdrophaligner.sh <query> <ref>
xdrophaligner.sh <query> <ref> <map>
xdrophaligner.sh <query> <ref> <map> <iterations>

XDropHAligner performs pairwise sequence alignment between a query and reference sequence. The tool accepts both literal nucleotide sequences and fasta files as input.

Parameters

Parameters control input sequences, visualization output, and benchmarking iterations.

Standard Parameters

query
A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference.
ref
A literal nucleotide sequence or fasta file. This is the reference sequence that the query will be aligned to.
map
Optional output text file for matrix score space. Set to "null" for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an image of the alignment state space exploration.
iterations
Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.

Java Parameters

-Xmx
Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Output Format

Standard Output

The aligner outputs the following information to standard output:

Map File Format

When a map file is specified, the tool outputs a state space exploration map showing the matrix scores during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.

Examples

Basic Alignment

xdrophaligner.sh ACGTACGTACGT ACGTACGTACGT

Align two literal sequences and output identity and position information.

Alignment with Fasta Files

xdrophaligner.sh query.fa reference.fa

Align sequences from fasta files.

Generate Visualization Map

xdrophaligner.sh query.fa ref.fa alignment_map.txt

Perform alignment and save the state space exploration map to a file for later visualization.

Benchmarking Mode

xdrophaligner.sh query.fa ref.fa null 1000

Run 1000 iterations without generating visualization output for performance benchmarking.

Complete Workflow with Visualization

xdrophaligner.sh query.fa ref.fa map.txt
visualizealignment.sh map.txt alignment_image.png

First generate the alignment map, then use visualizealignment.sh to create a graphical representation.

Algorithm Details

XDropH Algorithm

XDropHAligner implements an alignment algorithm with the following characteristics:

Bit Field Storage

The algorithm uses a bit-packing scheme to store multiple values in a single long integer:

This compact representation enables efficient storage and computation without separate data structures.

Scoring System

Bandwidth Calculation

The algorithm uses adaptive bandwidth based on sequence characteristics:

bandwidth = min(qLen/4+2, max(qLen,rLen)/32, 12)
bandwidth = max(2, bandwidth) + 3

This heuristic adjusts search space based on sequence lengths, balancing accuracy and performance.

Memory Requirements

Memory usage is low due to the two-array design:

Performance Characteristics

Special Features

IDAligner Interface

XDropHAligner implements the IDAligner interface, making it interchangeable with other alignment algorithms in the BBTools suite. This allows systematic benchmarking and comparison of different alignment strategies.

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org