QuabbleAligner

Script: quabblealigner.sh Package: aligner Class: QuabbleAligner.java

Aligns a query sequence to a reference using QuabbleAligner. Uses only 2 arrays and avoids traceback while giving an exact answer. Calculates rstart and rstop without traceback. Limited to length 2Mbp with 21 position bits.

Basic Usage

quabblealigner.sh <query> <ref>
quabblealigner.sh <query> <ref> <map>
quabblealigner.sh <query> <ref> <map> <iterations>

Parameters

QuabbleAligner accepts the following positional parameters for sequence alignment:

Parameters

query
A literal nucleotide sequence or fasta file. The query sequence to be aligned against the reference. Can contain any characters, but 'N' is treated as a special ambiguous nucleotide case.
ref
A literal nucleotide sequence or fasta file. The reference sequence that the query will be aligned to. Maximum length is limited to 2Mbp (2,097,152 bp) due to 21-bit position encoding.
map
Optional output text file for matrix score space visualization. Set to "null" for benchmarking with no visualization output. This file can be fed to visualizealignment.sh to create an image representation of the alignment state space exploration.
iterations
Optional integer for benchmarking multiple iterations of the same alignment. Used for performance testing to measure alignment speed and consistency across multiple runs.

Examples

Basic Sequence Alignment

quabblealigner.sh ATCGATCG ATCGATCGATCG

Aligns the query sequence "ATCGATCG" to the reference "ATCGATCGATCG" and outputs identity percentage along with reference start and stop positions.

File-based Alignment

quabblealigner.sh query.fasta ref.fasta

Aligns sequences from FASTA files, reading the query from query.fasta and reference from ref.fasta.

Alignment with Visualization

quabblealigner.sh ATCGATCG ATCGATCGATCG alignment_map.txt

Performs alignment and outputs the state space exploration matrix to alignment_map.txt, which can be visualized using visualizealignment.sh.

Benchmarking Mode

quabblealigner.sh ATCGATCG ATCGATCGATCG null 1000

Runs the alignment 1000 times for benchmarking purposes without generating visualization output (map set to null).

Algorithm Details

Core Algorithm

QuabbleAligner implements traceback-free pairwise sequence alignment using dual rolling arrays prev[rLen+1] and curr[rLen+1] with O(n) space complexity. The algorithm calculates Average Nucleotide Identity (ANI) through mathematical constraint solving without storing traceback matrices, using bit-packed encoding of position, deletion count, and score information.

Key Features

Scoring System

The algorithm uses a bit-packed scoring system with the following components:

Adaptive Bandwidth

QuabbleAligner automatically determines the alignment bandwidth based on sequence characteristics:

Sparse Matrix Strategy

The algorithm employs IntList-based sparse processing for memory efficiency:

Bit Field Organization

Each score entry contains packed information in a 64-bit long:

Performance Characteristics

Output Information

QuabbleAligner returns detailed alignment statistics:

Special Handling

Support

For questions and support: