WavefrontAligner

Script: wavefrontaligner.sh Package: aligner Class: WaveFrontAligner.java

Aligns a query sequence to a reference using WaveFrontAligner. The implementation is designed for visualization and is thus very inefficient, and purely for academic use. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.

Basic Usage

wavefrontaligner.sh <query> <ref>
wavefrontaligner.sh <query> <ref> <map>
wavefrontaligner.sh <query> <ref> <map> <iterations>

The tool accepts sequences either as literal nucleotide strings or as fasta files. The output includes sequence identity percentage and reference alignment positions (rstart, rstop).

Parameters

WaveFrontAligner has a simple parameter set focused on input/output specification and optional visualization and benchmarking features.

Parameters

query: A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference. Can contain any characters, with 'N' treated as a special case during alignment.
ref: A literal nucleotide sequence or fasta file. This is the reference sequence against which the query will be aligned. Can contain any characters, with 'N' treated as a special case during alignment.
map: Optional output text file for matrix score space visualization. This file contains the state space exploration map that shows the wavefront progression during alignment. Set to "null" for benchmarking with no visualization output. The map can be fed to visualizealignment.sh to create graphical representations.
iterations: Optional integer for benchmarking multiple iterations. When specified, the alignment will be performed this many times consecutively, useful for performance testing and timing measurements.

Examples

Basic Alignment

wavefrontaligner.sh ATCGATCG ATCGTTCG

Aligns two literal DNA sequences, outputting identity percentage and alignment positions.

File-based Alignment

wavefrontaligner.sh query.fasta reference.fasta

Aligns sequences from fasta files.

Alignment with Visualization

wavefrontaligner.sh ATCGATCG ATCGTTCG alignment_map.txt

Performs alignment and outputs a state space exploration map to alignment_map.txt for visualization.

Benchmarking

wavefrontaligner.sh ATCGATCG ATCGTTCG null 1000

Runs alignment 1000 times for benchmarking without generating visualization output.

Algorithm Details

WaveFront Alignment Algorithm

WaveFrontAligner implements a global alignment algorithm based on the WaveFront approach, which explores the edit distance space using diagonal wavefronts. This implementation uses rolling buffer arrays and edit distance iteration for visualization and educational purposes rather than maximum performance.

Core Algorithm Components

Diagonal Representation: The algorithm operates on diagonals k ranging from -qLen to rLen, where k = j - i (reference position minus query position)
Rolling Buffer Strategy: Uses two arrays (currentWF and nextWF) to track wavefront positions, minimizing memory usage compared to full matrix approaches
Edit Distance Iteration: Processes edit distances from 0 to maxEditDist, where maxEditDist = max(qLen, rLen) + 8
Match Extension: At each wavefront position, extends matches as far as possible before considering edit operations

Edit Operations

The algorithm supports three standard edit operations:

Insertion: Extends to diagonal k-1 with position i+1
Substitution: Extends to diagonal k with position i+1
Deletion: Extends to diagonal k+1 with position i

Memory and Performance Characteristics

Space Complexity: O(qLen + rLen) for wavefront storage using rolling buffer approach with two arrays instead of O(qLen × rLen) full matrix storage
Time Complexity: O(editDistance × (qLen + rLen)) in typical cases, with additional match extension overhead
Loop Counting: Tracks two types of loops - wavefront processing (loopCounter) and match extensions (loopCounter2) for performance analysis
Memory Usage: Uses 2000MB default memory allocation with 42% of available RAM in auto mode

Identity Calculation

The algorithm calculates sequence identity as: identity = max(0.0, 1.0 - editDistance / max(qLen, rLen)). This provides a percentage similarity score based on the minimum number of edits required to transform the query into the reference.

Special Cases and Edge Conditions

Empty Sequences: Returns identity of 1.0 if both sequences are empty, 0.0 if only one is empty
Character Support: Handles any characters in sequences, with 'N' receiving special treatment during alignment
Alignment Bounds: Sets position vector [0, max(0, rLen-1)] for global alignment
Reference Windows: Supports alignment within specified reference regions through the alignStatic wrapper method

Visualization Output

When a map file is specified, the algorithm generates a state space exploration map showing the progression of wavefronts through the alignment matrix. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process, making it valuable for educational and debugging purposes.

Academic and Research Applications

This implementation prioritizes clarity and visualization capability over raw performance, making it ideal for:

Educational demonstrations of wavefront alignment algorithms
Algorithm research and development
Debugging and understanding alignment behavior
Comparative studies of different alignment approaches

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org