CrosscutAligner

Script: crosscutaligner.sh Package: aligner Class: CrossCutAligner.java

Aligns a query sequence to a reference using CrossCutAligner. This fully explores the matrix using 4 arrays of roughly length reflen. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. CrossCut is a nontraditional aligner that fills antidiagonals, incurring zero data dependencies between loops. This allows perfect SIMD vectorization.

Basic Usage

crosscutaligner.sh <query> <ref>
crosscutaligner.sh <query> <ref> <map>
crosscutaligner.sh <query> <ref> <map> <iterations> <simd>

CrossCutAligner is a specialized sequence aligner that processes alignment matrices by filling antidiagonals rather than rows or columns. This approach eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization for high-performance alignment.

Parameters

CrossCutAligner accepts positional parameters for alignment configuration and optional benchmarking features.

Required Parameters

query
A literal nucleotide sequence or path to a FASTA file containing the query sequence to be aligned. Can contain any characters, with N treated as a special ambiguous nucleotide.
ref
A literal nucleotide sequence or path to a FASTA file containing the reference sequence for alignment. The alignment matrix will have dimensions based on the reference length.

Optional Parameters

map
Optional output text file for matrix score space visualization. Set to "null" for benchmarking with no visualization output. This feature has not yet been fully tested and may produce unexpected results in the current implementation.
iterations
Optional integer specifying the number of alignment iterations to perform for benchmarking purposes. Used to measure performance characteristics of the alignment algorithm across multiple runs.
simd
Enable vector (SIMD) instructions for optimized performance. The CrossCut algorithm's antidiagonal processing approach allows for perfect vectorization, significantly improving alignment speed on supported processors.

Examples

Basic Sequence Alignment

crosscutaligner.sh ATCGATCG ATCGATCGATCG

Aligns the query sequence "ATCGATCG" against the reference "ATCGATCGATCG" and returns identity score with alignment positions.

File-Based Alignment

crosscutaligner.sh query.fasta reference.fasta

Aligns sequences from FASTA files, processing the first sequence from each file.

Alignment with Visualization

crosscutaligner.sh query.fasta reference.fasta alignment_matrix.txt

Performs alignment and outputs the scoring matrix to a text file for visualization and analysis.

Benchmarking with SIMD

crosscutaligner.sh ATCGATCG ATCGATCGATCG null 1000 true

Runs 1000 alignment iterations with SIMD optimization enabled, using null output for pure performance testing.

Algorithm Details

CrossCut Alignment Strategy

CrossCutAligner implements a diagonal-processing alignment algorithm that fills alignment matrices by processing antidiagonals rather than traditional row-by-row or column-by-column methods. This fundamental change eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization.

Matrix Processing Method

The algorithm uses four long[] arrays to fully explore the alignment matrix:

Antidiagonal Processing

The core innovation processes diagonals that span from bottom-left to top-right of the alignment matrix. The main loop iterates k=2 to qLen+rLen, where for each diagonal k (where k = row + col), the algorithm:

Scoring System

The alignment uses a 64-bit packed scoring system with bit fields defined by constants:

SIMD Vectorization

The antidiagonal approach eliminates inter-loop data dependencies, allowing perfect SIMD vectorization through shared.SIMDAlign.processCrossCutDiagonalSIMD(). The conditional vectorization uses Shared.SIMD runtime detection with scalar fallback, providing significant performance improvements on modern processors supporting vector instructions.

Cell Calculation

Individual cell scores are calculated using calculateCellValue() method with branchless operations:

Output and Results

CrossCutAligner returns results processed by postprocess() method:

Performance Characteristics

The algorithm provides measurable performance advantages:

Special Sequence Handling

The algorithm includes optimizations for edge cases:

Performance Notes

CrossCutAligner is designed for high-performance applications requiring exact alignment identity scores. The antidiagonal processing approach provides several computational advantages:

Support

For questions and support: