CrosscutAligner
Aligns a query sequence to a reference using CrossCutAligner. This fully explores the matrix using 4 arrays of roughly length reflen. The sequences can be any characters, but N is a special case. Outputs the identity, rstart, and rstop positions. CrossCut is a nontraditional aligner that fills antidiagonals, incurring zero data dependencies between loops. This allows perfect SIMD vectorization.
Basic Usage
crosscutaligner.sh <query> <ref>
crosscutaligner.sh <query> <ref> <map>
crosscutaligner.sh <query> <ref> <map> <iterations> <simd>
CrossCutAligner is a specialized sequence aligner that processes alignment matrices by filling antidiagonals rather than rows or columns. This approach eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization for high-performance alignment.
Parameters
CrossCutAligner accepts positional parameters for alignment configuration and optional benchmarking features.
Required Parameters
- query
- A literal nucleotide sequence or path to a FASTA file containing the query sequence to be aligned. Can contain any characters, with N treated as a special ambiguous nucleotide.
- ref
- A literal nucleotide sequence or path to a FASTA file containing the reference sequence for alignment. The alignment matrix will have dimensions based on the reference length.
Optional Parameters
- map
- Optional output text file for matrix score space visualization. Set to "null" for benchmarking with no visualization output. This feature has not yet been fully tested and may produce unexpected results in the current implementation.
- iterations
- Optional integer specifying the number of alignment iterations to perform for benchmarking purposes. Used to measure performance characteristics of the alignment algorithm across multiple runs.
- simd
- Enable vector (SIMD) instructions for optimized performance. The CrossCut algorithm's antidiagonal processing approach allows for perfect vectorization, significantly improving alignment speed on supported processors.
Examples
Basic Sequence Alignment
crosscutaligner.sh ATCGATCG ATCGATCGATCG
Aligns the query sequence "ATCGATCG" against the reference "ATCGATCGATCG" and returns identity score with alignment positions.
File-Based Alignment
crosscutaligner.sh query.fasta reference.fasta
Aligns sequences from FASTA files, processing the first sequence from each file.
Alignment with Visualization
crosscutaligner.sh query.fasta reference.fasta alignment_matrix.txt
Performs alignment and outputs the scoring matrix to a text file for visualization and analysis.
Benchmarking with SIMD
crosscutaligner.sh ATCGATCG ATCGATCGATCG null 1000 true
Runs 1000 alignment iterations with SIMD optimization enabled, using null output for pure performance testing.
Algorithm Details
CrossCut Alignment Strategy
CrossCutAligner implements a diagonal-processing alignment algorithm that fills alignment matrices by processing antidiagonals rather than traditional row-by-row or column-by-column methods. This fundamental change eliminates data dependencies between loop iterations, enabling perfect SIMD vectorization.
Matrix Processing Method
The algorithm uses four long[] arrays to fully explore the alignment matrix:
- Diagonal Arrays: Three long[maxDiagLen] arrays (diag_km2, diag_km1, diag_k) where maxDiagLen=Math.max(qLen, rLen)+1, tracking packed scoring values across consecutive diagonals
- Bottom Array: One long[rLen+1] array storing final packed scores along the bottom row for optimal alignment identification via postprocess() method
Antidiagonal Processing
The core innovation processes diagonals that span from bottom-left to top-right of the alignment matrix. The main loop iterates k=2 to qLen+rLen, where for each diagonal k (where k = row + col), the algorithm:
- Calculates valid row ranges using minRow=Math.max(1, k-rLen) and maxRow=Math.min(qLen, k-1)
- Handles boundary conditions with specialized handleTop() and handleLeft() methods for matrix edge cells
- Processes inner cells using conditional shared.SIMDAlign.processCrossCutDiagonalSIMD() or scalar calculateCellValue() operations
- Rotates diagonal arrays using temp=diag_km2; diag_km2=diag_km1; diag_km1=diag_k; diag_k=temp for next iteration
Scoring System
The alignment uses a 64-bit packed scoring system with bit fields defined by constants:
- Position bits (21): POSITION_BITS=21 tracking alignment start positions (up to 2Mbp sequences) using POSITION_MASK
- Deletion bits (21): DEL_BITS=21 counting deletion operations using DEL_MASK shifted by POSITION_BITS
- Score bits (22): Remaining bits storing alignment scores with SCORE_SHIFT=42
- Match scoring: MATCH=1L << SCORE_SHIFT for matches
- Substitution/Indel scoring: SUB=INS=DEL=(-1L) << SCORE_SHIFT for penalties
- N bases: N_SCORE=0L for ambiguous nucleotides
SIMD Vectorization
The antidiagonal approach eliminates inter-loop data dependencies, allowing perfect SIMD vectorization through shared.SIMDAlign.processCrossCutDiagonalSIMD(). The conditional vectorization uses Shared.SIMD runtime detection with scalar fallback, providing significant performance improvements on modern processors supporting vector instructions.
Cell Calculation
Individual cell scores are calculated using calculateCellValue() method with branchless operations:
- Match detection: isMatch=(q==r) comparison
- N detection: hasN=(q|r)>15 check for ambiguous bases
- Score calculation: scoreAdd=isMatch?MATCH:(hasN?N_SCORE:SUB)
- Maximum selection: maxDiagUp=Math.max(diagScore, upScore) followed by conditional comparison with leftScore
Output and Results
CrossCutAligner returns results processed by postprocess() method:
- Identity: Fraction of matching bases calculated as matches/(matches+substitutions+insertions+deletions)
- Reference Start (rstart): Starting position extracted from (maxScore&POSITION_MASK)
- Reference Stop (rstop): Ending position using maxPos-1
- Operation counts: Matches, substitutions, insertions, and deletions recovered via mathematical constraints
Performance Characteristics
The algorithm provides measurable performance advantages:
- Memory Usage: Uses only 4 long arrays with total memory O(max(qLen, rLen))
- SIMD Performance: Perfect vectorization through antidiagonal processing eliminates data dependencies
- Sequence Limits: Supports sequences up to 2Mbp with 21-bit position encoding
- Performance Tracking: Built-in loops.addAndGet(mloops) counting for computational analysis
Special Sequence Handling
The algorithm includes optimizations for edge cases:
- Sequence Swapping: Automatically swaps query/reference using alignStatic() logic if query.length>ref.length when posVector==null
- Empty Sequences: Handles zero-length inputs with qLen==0 || rLen==0 check calling postprocess() directly
- N Base Processing: Special scoring using (q|r)>15 detection with N_SCORE=0 assignment
- Factory Encoding: Sequences processed through Factory.encodeLong() with (byte)(15+32) and (byte)(15+16) parameters
Performance Notes
CrossCutAligner is designed for high-performance applications requiring exact alignment identity scores. The antidiagonal processing approach provides several computational advantages:
- Vectorization: Perfect SIMD utilization through data dependency elimination
- Memory Access: Sequential array access patterns in diagonal processing optimize cache usage
- Scalability: Linear memory usage O(max(qLen, rLen)) with reference sequence length
- Benchmarking: Built-in AtomicLong loops counter for performance measurement and algorithm analysis
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org