WaveFrontAlignerViz

Script: wavefrontalignerviz.sh Package: aligner Class: WaveFrontAlignerViz.java

Aligns a query sequence to a reference using the WaveFront alignment algorithm. The sequences can be any characters, but N is treated as a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map that can be fed to visualizealignment.sh to generate alignment visualization images.

Basic Usage

wavefrontalignerviz.sh <query> <ref>
wavefrontalignerviz.sh <query> <ref> <map>
wavefrontalignerviz.sh <query> <ref> <map> <iterations>

WaveFrontAlignerViz performs global sequence alignment using the WaveFront algorithm, which efficiently computes edit distance-based alignments. The tool can process both literal nucleotide sequences and sequences from fasta files.

Parameters

Parameters are positional arguments provided in the order shown in the usage examples.

Standard Parameters

query
A literal nucleotide sequence or fasta file containing the query sequence to align.
ref
A literal nucleotide sequence or fasta file containing the reference sequence to align against.
map
Optional output text file for matrix score space visualization. This file contains the state space exploration data and can be processed by visualizealignment.sh to create alignment images. Set to null for benchmarking with no visualization output.
iterations
Optional integer for benchmarking multiple iterations. Runs the alignment multiple times to measure performance characteristics.

Java Parameters

-Xmx
Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Output Format

Standard Output

The tool prints alignment results to standard output, including:

Map File Format

When a map file is specified, the tool generates a text representation of the edit distance matrix showing the state space exploration during alignment. This visualization data can be processed by visualizealignment.sh to create graphical representations of the alignment process.

Examples

Basic Alignment with Literal Sequences

wavefrontalignerviz.sh ACGTACGT ACGTTACGT

Aligns two literal nucleotide sequences and outputs identity and alignment positions.

Alignment with Fasta Files

wavefrontalignerviz.sh query.fa reference.fa

Aligns sequences from fasta files.

Generate Visualization Map

wavefrontalignerviz.sh query.fa reference.fa alignment_map.txt

Performs alignment and writes state space exploration data to alignment_map.txt for later visualization.

Benchmarking with Multiple Iterations

wavefrontalignerviz.sh query.fa reference.fa null 1000

Runs alignment 1000 times without generating visualization output, useful for performance benchmarking.

Complete Workflow with Visualization

wavefrontalignerviz.sh query.fa reference.fa map.txt
visualizealignment.sh map.txt alignment_image.png

First generates the alignment map, then creates a visual representation of the alignment process.

Algorithm Details

WaveFront Alignment Algorithm

WaveFrontAlignerViz implements a global alignment algorithm based on the WaveFront approach, which efficiently computes edit distance-based alignments by processing wavefronts of increasing edit distance. The algorithm operates as follows:

Core Algorithm Steps

  1. Initialization: Create wavefront data structures with diagonal arrays covering all possible alignment paths (rLen + qLen + 1 diagonals)
  2. Edit Distance Matrix: Initialize an (qLen+1) × (rLen+1) matrix to track minimum edit distances to each position
  3. Backtracking Matrix: Maintain operation types (MATCH, SUB, INS, DEL) for reconstructing the alignment path
  4. Wavefront Processing: For each edit distance (0 to qLen+rLen):
    • Extend exact matches along active diagonals as far as possible
    • Generate next wavefront positions through substitutions, insertions, and deletions
    • Track visited cells and update edit distance matrix
    • Terminate early when reaching the endpoint (qLen, rLen)
  5. Traceback: Reconstruct the complete alignment by following operations from (qLen, rLen) back to (0, 0)

Operation Types

Visualization Data

When map output is enabled, the tool generates a complete record of the edit distance matrix showing:

Memory Requirements

Memory usage is proportional to the product of sequence lengths:

The default fixed allocation of 200MB is sufficient for sequences up to approximately 6,000 bases.

Performance Characteristics

Special Sequence Handling

Integration with visualizealignment.sh

The map output format is specifically designed for processing by visualizealignment.sh, which creates graphical representations showing:

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org