CrosscutAligner

Script: crosscutaligner.sh Package: aligner Class: CrossCutAligner.java

Aligns a query sequence to a reference using CrossCutAligner. This fully explores the matrix using 4 arrays of roughly length reflen. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. CrossCut is a nontraditional aligner that fills antidiagonals, incurring zero data dependencies between loops. This allows perfect SIMD vectorization.

Basic Usage

crosscutaligner.sh <query> <ref>
crosscutaligner.sh <query> <ref> <map>
crosscutaligner.sh <query> <ref> <map> <iterations> <simd>

CrossCutAligner is a specialized sequence aligner that processes alignment matrices by filling antidiagonals rather than rows or columns. This approach eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization for high-performance alignment.

Parameters

CrossCutAligner accepts positional parameters for alignment configuration and optional benchmarking features.

Required Parameters

query: A literal nucleotide sequence or path to a FASTA file containing the query sequence to be aligned. Can contain any characters, with N treated as a special ambiguous nucleotide.
ref: A literal nucleotide sequence or path to a FASTA file containing the reference sequence for alignment. The alignment matrix will have dimensions based on the reference length.

Optional Parameters

map: Optional output text file for matrix score space visualization. Set to "null" for benchmarking with no visualization output. This feature has not yet been fully tested and may produce unexpected results in the current implementation.
iterations: Optional integer specifying the number of alignment iterations to perform for benchmarking purposes. Used to measure performance characteristics of the alignment algorithm across multiple runs.
simd: Enable vector (SIMD) instructions for optimized performance. The CrossCut algorithm's antidiagonal processing approach allows for perfect vectorization, significantly improving alignment speed on supported processors.

Examples

Basic Sequence Alignment

crosscutaligner.sh ATCGATCG ATCGATCGATCG

Aligns the query sequence "ATCGATCG" against the reference "ATCGATCGATCG" and returns identity score with alignment positions.

File-Based Alignment

crosscutaligner.sh query.fasta reference.fasta

Aligns sequences from FASTA files, processing the first sequence from each file.

Alignment with Visualization

crosscutaligner.sh query.fasta reference.fasta alignment_matrix.txt

Performs alignment and outputs the scoring matrix to a text file for visualization and analysis.

Benchmarking with SIMD

crosscutaligner.sh ATCGATCG ATCGATCGATCG null 1000 true

Runs 1000 alignment iterations with SIMD optimization enabled, using null output for pure performance testing.

Algorithm Details

CrossCut Alignment Strategy

CrossCutAligner implements a diagonal-processing alignment algorithm that fills alignment matrices by processing antidiagonals rather than traditional row-by-row or column-by-column methods. This fundamental change eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization.

Matrix Processing Method

The algorithm uses four long[] arrays to fully explore the alignment matrix:

Diagonal Arrays: Three long[maxDiagLen] arrays (diag_km2, diag_km1, diag_k) where maxDiagLen=Math.max(qLen, rLen)+1, tracking packed scoring values across consecutive diagonals
Bottom Array: One long[rLen+1] array storing final packed scores along the bottom row for optimal alignment identification via postprocess() method

Antidiagonal Processing

The core innovation processes diagonals that span from bottom-left to top-right of the alignment matrix. The main loop iterates k=2 to qLen+rLen, where for each diagonal k (where k = row + col), the algorithm:

Calculates valid row ranges using minRow=Math.max(1, k-rLen) and maxRow=Math.min(qLen, k-1)
Handles boundary conditions with specialized handleTop() and handleLeft() methods for matrix edge cells
Processes inner cells using conditional shared.SIMDAlign.processCrossCutDiagonalSIMD() or scalar calculateCellValue() operations
Rotates diagonal arrays using temp=diag_km2; diag_km2=diag_km1; diag_km1=diag_k; diag_k=temp for next iteration

Scoring System

The alignment uses a 64-bit packed scoring system with bit fields defined by constants:

Position bits (21): POSITION_BITS=21 tracking alignment start positions (up to 2Mbp sequences) using POSITION_MASK
Deletion bits (21): DEL_BITS=21 counting deletion operations using DEL_MASK shifted by POSITION_BITS
Score bits (22): Remaining bits storing alignment scores with SCORE_SHIFT=42
Match scoring: MATCH=1L << SCORE_SHIFT for matches
Substitution/Indel scoring: SUB=INS=DEL=(-1L) << SCORE_SHIFT for penalties
N bases: N_SCORE=0L for ambiguous nucleotides

SIMD Vectorization

The antidiagonal approach eliminates inter-loop data dependencies, allowing perfect SIMD vectorization through shared.SIMDAlign.processCrossCutDiagonalSIMD(). The conditional vectorization uses Shared.SIMD runtime detection with scalar fallback, providing significant performance improvements on modern processors supporting vector instructions.

Cell Calculation

Individual cell scores are calculated using calculateCellValue() method with branchless operations:

Match detection: isMatch=(q==r) comparison
N detection: hasN=(q|r)>15 check for ambiguous bases
Score calculation: scoreAdd=isMatch?MATCH:(hasN?N_SCORE:SUB)
Maximum selection: maxDiagUp=Math.max(diagScore, upScore) followed by conditional comparison with leftScore

Output and Results

CrossCutAligner returns results processed by postprocess() method:

Identity: Fraction of matching bases calculated as matches/(matches+substitutions+insertions+deletions)
Reference Start (rstart): Starting position extracted from (maxScore&POSITION_MASK)
Reference Stop (rstop): Ending position using maxPos-1
Operation counts: Matches, substitutions, insertions, and deletions recovered via mathematical constraints

Performance Characteristics

The algorithm provides measurable performance advantages:

Memory Usage: Uses only 4 long arrays with total memory O(max(qLen, rLen))
SIMD Performance: Perfect vectorization through antidiagonal processing eliminates data dependencies
Sequence Limits: Supports sequences up to 2Mbp with 21-bit position encoding
Performance Tracking: Built-in loops.addAndGet(mloops) counting for computational analysis

Special Sequence Handling

The algorithm includes optimizations for edge cases:

Sequence Swapping: Automatically swaps query/reference using alignStatic() logic if query.length>ref.length when posVector==null
Empty Sequences: Handles zero-length inputs with qLen==0 || rLen==0 check calling postprocess() directly
N Base Processing: Special scoring using (q|r)>15 detection with N_SCORE=0 assignment
Factory Encoding: Sequences processed through Factory.encodeLong() with (byte)(15+32) and (byte)(15+16) parameters

Performance Notes

CrossCutAligner is designed for high-performance applications requiring exact alignment identity scores. The antidiagonal processing approach provides several computational advantages:

Vectorization: Perfect SIMD utilization through data dependency elimination
Memory Access: Sequential array access patterns in diagonal processing optimize cache usage
Scalability: Linear memory usage O(max(qLen, rLen)) with reference sequence length
Benchmarking: Built-in AtomicLong loops counter for performance measurement and algorithm analysis

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org