WaveFrontAlignerViz

Script: wavefrontalignerviz.sh Package: aligner Class: WaveFrontAlignerViz.java

Aligns a query sequence to a reference using the WaveFront alignment algorithm. The sequences can be any characters, but N is treated as a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map that can be fed to visualizealignment.sh to generate alignment visualization images.

Basic Usage

wavefrontalignerviz.sh <query> <ref>
wavefrontalignerviz.sh <query> <ref> <map>
wavefrontalignerviz.sh <query> <ref> <map> <iterations>

WaveFrontAlignerViz performs global sequence alignment using the WaveFront algorithm, which efficiently computes edit distance-based alignments. The tool can process both literal nucleotide sequences and sequences from fasta files.

Parameters

Parameters are positional arguments provided in the order shown in the usage examples.

Standard Parameters

query: A literal nucleotide sequence or fasta file containing the query sequence to align.
ref: A literal nucleotide sequence or fasta file containing the reference sequence to align against.
map: Optional output text file for matrix score space visualization. This file contains the state space exploration data and can be processed by visualizealignment.sh to create alignment images. Set to null for benchmarking with no visualization output.
iterations: Optional integer for benchmarking multiple iterations. Runs the alignment multiple times to measure performance characteristics.

Java Parameters

-Xmx: Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Output Format

Standard Output

The tool prints alignment results to standard output, including:

Identity: The fraction of aligned positions that match (matches / total aligned positions)
rstart: Starting position in the reference sequence (0-based)
rstop: Ending position in the reference sequence (0-based, inclusive)

Map File Format

When a map file is specified, the tool generates a text representation of the edit distance matrix showing the state space exploration during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.

Examples

Basic Alignment with Literal Sequences

wavefrontalignerviz.sh ACGTACGT ACGTTACGT

Aligns two literal nucleotide sequences and outputs identity and alignment positions.

Alignment with Fasta Files

wavefrontalignerviz.sh query.fa reference.fa

Aligns sequences from fasta files.

Generate Visualization Map

wavefrontalignerviz.sh query.fa reference.fa alignment_map.txt

Performs alignment and writes state space exploration data to alignment_map.txt for later visualization.

Benchmarking with Multiple Iterations

wavefrontalignerviz.sh query.fa reference.fa null 1000

Runs alignment 1000 times without generating visualization output, useful for performance benchmarking.

Complete Workflow with Visualization

wavefrontalignerviz.sh query.fa reference.fa map.txt
visualizealignment.sh map.txt alignment_image.png

First generates the alignment map, then creates a visual representation of the alignment process.

Algorithm Details

WaveFront Alignment Algorithm

WaveFrontAlignerViz implements a global alignment algorithm based on the WaveFront approach, which efficiently computes edit distance-based alignments by processing wavefronts of increasing edit distance. The algorithm operates as follows:

Core Algorithm Steps

Initialization: Create wavefront data structures with diagonal arrays covering all possible alignment paths (rLen + qLen + 1 diagonals)
Edit Distance Matrix: Initialize an (qLen+1) × (rLen+1) matrix to track minimum edit distances to each position
Backtracking Matrix: Maintain operation types (MATCH, SUB, INS, DEL) for reconstructing the alignment path
Wavefront Processing: For each edit distance (0 to qLen+rLen):
- Extend exact matches along active diagonals as far as possible
- Generate next wavefront positions through substitutions, insertions, and deletions
- Track visited cells and update edit distance matrix
- Terminate early when reaching the endpoint (qLen, rLen)
Traceback: Reconstruct the complete alignment by following operations from (qLen, rLen) back to (0, 0)

Operation Types

MATCH (OP_MATCH): Exact character match with no penalty, extended greedily
SUB (OP_SUB): Substitution - diagonal move with edit distance + 1
INS (OP_INS): Insertion (gap in reference) - vertical move with edit distance + 1
DEL (OP_DEL): Deletion (gap in query) - horizontal move with edit distance + 1

Visualization Data

When map output is enabled, the tool generates a complete record of the edit distance matrix showing:

Edit distances at each position in the alignment grid
State space exploration patterns showing which cells were visited
Wavefront progression through increasing edit distances

Memory Requirements

Memory usage is proportional to the product of sequence lengths:

Wavefront Arrays: 2 × (qLen + rLen + 1) integers
Edit Distance Matrix: (qLen + 1) × (rLen + 1) integers
Operations Matrix: (qLen + 1) × (rLen + 1) bytes
Total: Approximately 5 × qLen × rLen bytes

The default fixed allocation of 200MB is sufficient for sequences up to approximately 6,000 bases.

Performance Characteristics

Time Complexity: O(qLen × rLen) in worst case, but often faster due to early termination when endpoint is reached
Space Complexity: O(qLen × rLen) for complete edit distance and operations matrices
Loop Counter: Tracks total cells processed for performance analysis (accessible via loops() method)

Special Sequence Handling

Empty Sequences: Returns identity 0.0 with position vector [0, max(0, rLen-1)]
Sequence Swapping: When no position vector is needed and query is longer than reference, sequences are swapped to optimize memory access patterns, then results are corrected
N Characters: Treated specially during matching (details in WaveFrontAlignerViz implementation)

Integration with visualizealignment.sh

The map output format is specifically designed for processing by visualizealignment.sh, which creates graphical representations showing:

Color-coded edit distance heatmaps
Wavefront progression patterns
Optimal alignment path through the matrix
State space exploration efficiency

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org