WavefrontAligner
Aligns a query sequence to a reference using WaveFrontAligner. The implementation is designed for visualization and is thus very inefficient, and purely for academic use. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.
Basic Usage
wavefrontaligner.sh <query> <ref>
wavefrontaligner.sh <query> <ref> <map>
wavefrontaligner.sh <query> <ref> <map> <iterations>
The tool accepts sequences either as literal nucleotide strings or as fasta files. The output includes sequence identity percentage and reference alignment positions (rstart, rstop).
Parameters
WaveFrontAligner has a simple parameter set focused on input/output specification and optional visualization and benchmarking features.
Parameters
- query
- A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference. Can contain any characters, with 'N' treated as a special case during alignment.
- ref
- A literal nucleotide sequence or fasta file. This is the reference sequence against which the query will be aligned. Can contain any characters, with 'N' treated as a special case during alignment.
- map
- Optional output text file for matrix score space visualization. This file contains the state space exploration map that shows the wavefront progression during alignment. Set to "null" for benchmarking with no visualization output. The map can be fed to visualizealignment.sh to create graphical representations.
- iterations
- Optional integer for benchmarking multiple iterations. When specified, the alignment will be performed this many times consecutively, useful for performance testing and timing measurements.
Examples
Basic Alignment
wavefrontaligner.sh ATCGATCG ATCGTTCG
Aligns two literal DNA sequences, outputting identity percentage and alignment positions.
File-based Alignment
wavefrontaligner.sh query.fasta reference.fasta
Aligns sequences from fasta files.
Alignment with Visualization
wavefrontaligner.sh ATCGATCG ATCGTTCG alignment_map.txt
Performs alignment and outputs a state space exploration map to alignment_map.txt for visualization.
Benchmarking
wavefrontaligner.sh ATCGATCG ATCGTTCG null 1000
Runs alignment 1000 times for benchmarking without generating visualization output.
Algorithm Details
WaveFront Alignment Algorithm
WaveFrontAligner implements a global alignment algorithm based on the WaveFront approach, which explores the edit distance space using diagonal wavefronts. This implementation uses rolling buffer arrays and edit distance iteration for visualization and educational purposes rather than maximum performance.
Core Algorithm Components
- Diagonal Representation: The algorithm operates on diagonals k ranging from -qLen to rLen, where k = j - i (reference position minus query position)
- Rolling Buffer Strategy: Uses two arrays (currentWF and nextWF) to track wavefront positions, minimizing memory usage compared to full matrix approaches
- Edit Distance Iteration: Processes edit distances from 0 to maxEditDist, where maxEditDist = max(qLen, rLen) + 8
- Match Extension: At each wavefront position, extends matches as far as possible before considering edit operations
Edit Operations
The algorithm supports three standard edit operations:
- Insertion: Extends to diagonal k-1 with position i+1
- Substitution: Extends to diagonal k with position i+1
- Deletion: Extends to diagonal k+1 with position i
Memory and Performance Characteristics
- Space Complexity: O(qLen + rLen) for wavefront storage using rolling buffer approach with two arrays instead of O(qLen × rLen) full matrix storage
- Time Complexity: O(editDistance × (qLen + rLen)) in typical cases, with additional match extension overhead
- Loop Counting: Tracks two types of loops - wavefront processing (loopCounter) and match extensions (loopCounter2) for performance analysis
- Memory Usage: Uses 2000MB default memory allocation with 42% of available RAM in auto mode
Identity Calculation
The algorithm calculates sequence identity as: identity = max(0.0, 1.0 - editDistance / max(qLen, rLen)). This provides a percentage similarity score based on the minimum number of edits required to transform the query into the reference.
Special Cases and Edge Conditions
- Empty Sequences: Returns identity of 1.0 if both sequences are empty, 0.0 if only one is empty
- Character Support: Handles any characters in sequences, with 'N' receiving special treatment during alignment
- Alignment Bounds: Sets position vector [0, max(0, rLen-1)] for global alignment
- Reference Windows: Supports alignment within specified reference regions through the alignStatic wrapper method
Visualization Output
When a map file is specified, the algorithm generates a state space exploration map showing the progression of wavefronts through the alignment matrix. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process, making it valuable for educational and debugging purposes.
Academic and Research Applications
This implementation prioritizes clarity and visualization capability over raw performance, making it ideal for:
- Educational demonstrations of wavefront alignment algorithms
- Algorithm research and development
- Debugging and understanding alignment behavior
- Comparative studies of different alignment approaches
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org