ScrabbleAligner

Script: scrabblealigner.sh Package: aligner Class: ScrabbleAligner.java

Aligns sequences using adaptive banding with dynamic bandwidth adjustment. Uses only 2 arrays and avoids traceback while calculating exact ANI. Band center drifts toward highest scores, and bandwidth dynamically widens or narrows in response to sequence identity. Similar to WobbleAligner but trades the ring buffer for scalar variables for band tracking.

Basic Usage

scrabblealigner.sh <query> <ref>
scrabblealigner.sh <query> <ref> <map>
scrabblealigner.sh <query> <ref> <map> <iterations>

ScrabbleAligner accepts literal nucleotide sequences or FASTA files as input. Sequences can be any characters, but N is a special case (scores 0 to avoid penalizing unknown bases). The tool outputs identity, rstart, and rstop positions. Optionally generates a state space exploration map for visualization.

Parameters

Parameters are positional arguments. The tool also accepts standard Java memory flags.

Standard Parameters

query
A literal nucleotide sequence or fasta file.
ref
A literal nucleotide sequence or fasta file.
map
Optional output text file for matrix score space. Set to null for benchmarking with no visualization.
iterations
Optional integer for benchmarking multiple iterations.

Java Parameters

-Xmx
Set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 200m (fixed allocation).
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Output Format

ScrabbleAligner outputs alignment results to standard output:

Identity: 0.9876
Reference start: 123
Reference stop: 5432

Visualization Map

When the map parameter is specified, ScrabbleAligner generates a state space exploration map showing the alignment matrix cells explored during banded alignment. This map can be fed to visualizealignment.sh to create images showing the alignment pathway.

Examples

Basic Alignment

scrabblealigner.sh ACGTACGTACGT ACGTACGTACGT

Align two literal sequences and report identity.

FASTA File Input

scrabblealigner.sh query.fa reference.fa

Align sequences from FASTA files.

Generate Visualization Map

scrabblealigner.sh query.fa ref.fa alignment_map.txt

Align sequences and generate state space exploration map for visualization.

Visualization Pipeline

scrabblealigner.sh query.fa ref.fa map.txt
visualizealignment.sh map.txt alignment.png

Complete workflow: align sequences, generate map, create visualization image.

Performance Benchmarking

scrabblealigner.sh query.fa ref.fa null 1000

Run 1000 iterations with no visualization output for performance testing.

Algorithm Details

Adaptive Banding Strategy

ScrabbleAligner uses a sophisticated adaptive banding approach that balances speed with accuracy:

  1. Initial Bandwidth: Calculated by comparing the first min(qLen, rLen) bases. For high-identity sequences, the band starts narrow. For divergent sequences, it starts wider. The formula is: mid(7, 1+max(qLen,rLen)/32, 20+sqrt(rLen)/8)
  2. Dynamic Bandwidth Adjustment: As alignment progresses, the bandwidth widens when sequences diverge (rapid increase up to 8 cells per row) and narrows when sequences match (cautious reduction by doubling the decrement). This allows the algorithm to handle local variations while maintaining efficiency.
  3. Band Center Drift: The band center moves toward the highest-scoring position in each row (limited to ±2 cells per row). This allows the alignment to "follow" the optimal path without exploring the entire matrix.
  4. Glocal Capability: Band starts wide at the top (16 + 12×bandwidth) and narrows toward the bottom, allowing free initial insertions for glocal alignments while maintaining efficiency.

Memory-Efficient Design

Bit-Packing Scheme

Each cell stores three values in a single 64-bit long:

Scoring Scheme

Identity Calculation

ScrabbleAligner calculates exact identity without traceback by solving the system of equations:

M + S + I = qLen
M + S + D = refAlnLength
Score = M - S - I - D

Where M=matches, S=substitutions, I=insertions, D=deletions. The solution yields: matches = (score + qLen + deletions) / 2, from which identity = matches / (matches + substitutions + insertions + deletions).

Performance Characteristics

Comparison to Other Aligners

ScrabbleAligner vs WobbleAligner: Both use adaptive banding with drift, but ScrabbleAligner uses scalar variables for band tracking instead of WobbleAligner's ring buffer approach. The scalar design simplifies the implementation while maintaining comparable performance.

Integration with Visualization: ScrabbleAligner's map output format is compatible with visualizealignment.sh, enabling visual exploration of the adaptive banding pathway through the alignment matrix.

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org