WavefrontAligner

Script: wavefrontaligner.sh Package: aligner Class: WaveFrontAligner.java

Aligns a query sequence to a reference using WaveFrontAligner. The implementation is designed for visualization and is thus very inefficient, and purely for academic use. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.

Basic Usage

wavefrontaligner.sh <query> <ref>
wavefrontaligner.sh <query> <ref> <map>
wavefrontaligner.sh <query> <ref> <map> <iterations>

The tool accepts sequences either as literal nucleotide strings or as fasta files. The output includes sequence identity percentage and reference alignment positions (rstart, rstop).

Parameters

WaveFrontAligner has a simple parameter set focused on input/output specification and optional visualization and benchmarking features.

Parameters

query
A literal nucleotide sequence or fasta file. This is the sequence to be aligned against the reference. Can contain any characters, with 'N' treated as a special case during alignment.
ref
A literal nucleotide sequence or fasta file. This is the reference sequence against which the query will be aligned. Can contain any characters, with 'N' treated as a special case during alignment.
map
Optional output text file for matrix score space visualization. This file contains the state space exploration map that shows the wavefront progression during alignment. Set to "null" for benchmarking with no visualization output. The map can be fed to visualizealignment.sh to create graphical representations.
iterations
Optional integer for benchmarking multiple iterations. When specified, the alignment will be performed this many times consecutively, useful for performance testing and timing measurements.

Examples

Basic Alignment

wavefrontaligner.sh ATCGATCG ATCGTTCG

Aligns two literal DNA sequences, outputting identity percentage and alignment positions.

File-based Alignment

wavefrontaligner.sh query.fasta reference.fasta

Aligns sequences from fasta files.

Alignment with Visualization

wavefrontaligner.sh ATCGATCG ATCGTTCG alignment_map.txt

Performs alignment and outputs a state space exploration map to alignment_map.txt for visualization.

Benchmarking

wavefrontaligner.sh ATCGATCG ATCGTTCG null 1000

Runs alignment 1000 times for benchmarking without generating visualization output.

Algorithm Details

WaveFront Alignment Algorithm

WaveFrontAligner implements a global alignment algorithm based on the WaveFront approach, which explores the edit distance space using diagonal wavefronts. This implementation uses rolling buffer arrays and edit distance iteration for visualization and educational purposes rather than maximum performance.

Core Algorithm Components

Edit Operations

The algorithm supports three standard edit operations:

Memory and Performance Characteristics

Identity Calculation

The algorithm calculates sequence identity as: identity = max(0.0, 1.0 - editDistance / max(qLen, rLen)). This provides a percentage similarity score based on the minimum number of edits required to transform the query into the reference.

Special Cases and Edge Conditions

Visualization Output

When a map file is specified, the algorithm generates a state space exploration map showing the progression of wavefronts through the alignment matrix. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process, making it valuable for educational and debugging purposes.

Academic and Research Applications

This implementation prioritizes clarity and visualization capability over raw performance, making it ideal for:

Support

For questions and support: