WaveFrontAlignerViz
Aligns a query sequence to a reference using the WaveFront alignment algorithm. The sequences can be any characters, but N is treated as a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map that can be fed to visualizealignment.sh to generate alignment visualization images.
Basic Usage
wavefrontalignerviz.sh <query> <ref>
wavefrontalignerviz.sh <query> <ref> <map>
wavefrontalignerviz.sh <query> <ref> <map> <iterations>
WaveFrontAlignerViz performs global sequence alignment using the WaveFront algorithm, which efficiently computes edit distance-based alignments. The tool can process both literal nucleotide sequences and sequences from fasta files.
Parameters
Parameters are positional arguments provided in the order shown in the usage examples.
Standard Parameters
- query
- A literal nucleotide sequence or fasta file containing the query sequence to align.
- ref
- A literal nucleotide sequence or fasta file containing the reference sequence to align against.
- map
- Optional output text file for matrix score space visualization. This file contains the state space exploration data and can be processed by visualizealignment.sh to create alignment images. Set to null for benchmarking with no visualization output.
- iterations
- Optional integer for benchmarking multiple iterations. Runs the alignment multiple times to measure performance characteristics.
Java Parameters
- -Xmx
- Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Output Format
Standard Output
The tool prints alignment results to standard output, including:
- Identity: The fraction of aligned positions that match (matches / total aligned positions)
- rstart: Starting position in the reference sequence (0-based)
- rstop: Ending position in the reference sequence (0-based, inclusive)
Map File Format
When a map file is specified, the tool generates a text representation of the edit distance matrix showing the state space exploration during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.
Examples
Basic Alignment with Literal Sequences
wavefrontalignerviz.sh ACGTACGT ACGTTACGT
Aligns two literal nucleotide sequences and outputs identity and alignment positions.
Alignment with Fasta Files
wavefrontalignerviz.sh query.fa reference.fa
Aligns sequences from fasta files.
Generate Visualization Map
wavefrontalignerviz.sh query.fa reference.fa alignment_map.txt
Performs alignment and writes state space exploration data to alignment_map.txt for later visualization.
Benchmarking with Multiple Iterations
wavefrontalignerviz.sh query.fa reference.fa null 1000
Runs alignment 1000 times without generating visualization output, useful for performance benchmarking.
Complete Workflow with Visualization
wavefrontalignerviz.sh query.fa reference.fa map.txt
visualizealignment.sh map.txt alignment_image.png
First generates the alignment map, then creates a visual representation of the alignment process.
Algorithm Details
WaveFront Alignment Algorithm
WaveFrontAlignerViz implements a global alignment algorithm based on the WaveFront approach, which efficiently computes edit distance-based alignments by processing wavefronts of increasing edit distance. The algorithm operates as follows:
Core Algorithm Steps
- Initialization: Create wavefront data structures with diagonal arrays covering all possible alignment paths (rLen + qLen + 1 diagonals)
- Edit Distance Matrix: Initialize an (qLen+1) × (rLen+1) matrix to track minimum edit distances to each position
- Backtracking Matrix: Maintain operation types (MATCH, SUB, INS, DEL) for reconstructing the alignment path
- Wavefront Processing: For each edit distance (0 to qLen+rLen):
- Extend exact matches along active diagonals as far as possible
- Generate next wavefront positions through substitutions, insertions, and deletions
- Track visited cells and update edit distance matrix
- Terminate early when reaching the endpoint (qLen, rLen)
- Traceback: Reconstruct the complete alignment by following operations from (qLen, rLen) back to (0, 0)
Operation Types
- MATCH (OP_MATCH): Exact character match with no penalty, extended greedily
- SUB (OP_SUB): Substitution - diagonal move with edit distance + 1
- INS (OP_INS): Insertion (gap in reference) - vertical move with edit distance + 1
- DEL (OP_DEL): Deletion (gap in query) - horizontal move with edit distance + 1
Visualization Data
When map output is enabled, the tool generates a complete record of the edit distance matrix showing:
- Edit distances at each position in the alignment grid
- State space exploration patterns showing which cells were visited
- Wavefront progression through increasing edit distances
Memory Requirements
Memory usage is proportional to the product of sequence lengths:
- Wavefront Arrays: 2 × (qLen + rLen + 1) integers
- Edit Distance Matrix: (qLen + 1) × (rLen + 1) integers
- Operations Matrix: (qLen + 1) × (rLen + 1) bytes
- Total: Approximately 5 × qLen × rLen bytes
The default fixed allocation of 200MB is sufficient for sequences up to approximately 6,000 bases.
Performance Characteristics
- Time Complexity: O(qLen × rLen) in worst case, but often faster due to early termination when endpoint is reached
- Space Complexity: O(qLen × rLen) for complete edit distance and operations matrices
- Loop Counter: Tracks total cells processed for performance analysis (accessible via loops() method)
Special Sequence Handling
- Empty Sequences: Returns identity 0.0 with position vector [0, max(0, rLen-1)]
- Sequence Swapping: When no position vector is needed and query is longer than reference, sequences are swapped to optimize memory access patterns, then results are corrected
- N Characters: Treated specially during matching (details in WaveFrontAlignerViz implementation)
Integration with visualizealignment.sh
The map output format is specifically designed for processing by visualizealignment.sh, which creates graphical representations showing:
- Color-coded edit distance heatmaps
- Wavefront progression patterns
- Optimal alignment path through the matrix
- State space exploration efficiency
Support
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org