DriftingAligner

Script: driftingaligner.sh Package: aligner Class: DriftingAligner.java

Aligns a query sequence to a reference using DriftingAligner. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. Optionally prints a state space exploration map. This map can be fed to visualizealignment.sh to make an image.

Basic Usage

driftingaligner.sh <query> <ref>
driftingaligner.sh <query> <ref> <map>
driftingaligner.sh <query> <ref> <map> <iterations>

The tool accepts sequences as literal nucleotide strings or FASTA file paths. It outputs alignment identity and reference start/stop positions, with optional state space visualization for use with visualizealignment.sh.

Parameters

Parameters are provided as positional arguments in the order shown below.

Required Parameters

query
A literal nucleotide sequence or FASTA file path. The query sequence to be aligned against the reference.
ref
A literal nucleotide sequence or FASTA file path. The reference sequence that the query will be aligned to.

Optional Parameters

map
Optional output text file for matrix score space visualization. Set to null for benchmarking with no visualization. This file can be fed to visualizealignment.sh to create an alignment visualization image.
iterations
Optional integer for benchmarking multiple iterations. Useful for performance testing and timing measurements.

Examples

Basic Sequence Alignment

driftingaligner.sh ATCGATCGATCG ATCGATCGATCG

Aligns two identical sequences, should return identity of 1.0 with alignment spanning the full reference.

File-based Alignment

driftingaligner.sh query.fasta reference.fasta

Aligns sequences from FASTA files, outputting identity and alignment coordinates.

Alignment with Visualization

driftingaligner.sh query.fasta reference.fasta alignment_map.txt

Performs alignment and saves the state space exploration map to alignment_map.txt for visualization with visualizealignment.sh.

Benchmarking Mode

driftingaligner.sh ATCGATCGATCG ATCGATCGATCG null 1000

Runs the alignment 1000 times for benchmarking purposes without generating visualization output.

Algorithm Details

DriftingAligner implements a center-drifting banded dynamic programming algorithm using dual rolling arrays to achieve O(n) space complexity. The implementation features identity-responsive bandwidth adaptation and mathematical constraint solving for exact operation count recovery without traceback matrices.

Core Implementation Architecture

Bandwidth Decision Algorithm

The decideBandwidth() method (lines 58-64) calculates initial band width using early mismatch detection:

bandwidth = Tools.mid(8, 1+Math.max(qLen, rLen)/16, 40+(int)Math.sqrt(rLen)/4)
for(int i=0, minlen=Math.min(qLen, rLen); i<minlen && subs<bandwidth; i++) {
    subs+=(query[i]!=ref[i] ? 1 : 0);
}
return Math.min(subs+1, bandwidth);

Bit-Field Architecture

The 64-bit encoding system uses specific bit ranges and constants:

Scoring Constants and Operations

Mathematical Constraint Solving

The postprocess() method (lines 194-244) uses a system of linear equations to recover exact operation counts:

// Extract packed information
int originPos = (int)(maxScore & POSITION_MASK);
int deletions = (int)((maxScore & DEL_MASK) >> POSITION_BITS);
int rawScore = (int)(maxScore >> SCORE_SHIFT);

// Solve constraint system: M+S+I=qLen, M+S+D=refAlnLength, Score=M-S-I-D
int insertions = Math.max(0, qLen+deletions-refAlnLength);
float matches = ((rawScore+qLen+deletions)/2f);
float substitutions = Math.max(0, qLen-matches-insertions);
float identity = matches/(matches+substitutions+insertions+deletions);

Band Navigation and Quality Response

Sequence Processing Optimizations

Output Coordinate Calculation

The tool extracts alignment coordinates from bit-packed scores:

Support

For questions and support: