AllToAll

Basic Usage

alltoall.sh in=<input file> out=<output file>

Input may be fasta or fastq, compressed or uncompressed. The tool performs pairwise alignment between all input sequences and outputs a symmetric identity matrix showing sequence similarity scores.

Parameters

Parameters control input/output settings, threading, and memory management for the all-to-all alignment process.

Standard parameters

in=<file>: Input sequences. Accepts FASTA or FASTQ format, compressed or uncompressed.
out=<file>: Output data. Tab-delimited identity matrix with sequence names as headers and percentage identity values (0-100).
t=: Set the number of threads; default is logical processors. Multi-threading uses AtomicInteger work distribution across ProcessThread workers.
overwrite=f: (ow) Set to false to force the program to abort rather than overwrite an existing file.
showspeed=t: (ss) Set to 'f' to suppress display of processing speed.
ziplevel=2: (zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression uses less CPU time.
reads=-1: If positive, quit after this many sequences. Useful for testing with subset of large datasets.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Examples

Basic All-to-All Alignment

alltoall.sh in=sequences.fasta out=identity_matrix.txt

Performs pairwise alignment between all sequences in sequences.fasta and outputs an identity matrix to identity_matrix.txt.

Multi-threaded Processing

alltoall.sh in=large_dataset.fq out=results.txt t=8

Uses 8 threads to process a large FASTQ dataset with AtomicInteger work distribution across ProcessThread workers.

Subset Analysis

alltoall.sh in=sequences.fa out=subset_matrix.txt reads=100

Processes only the first 100 sequences from the input file, useful for testing or smaller analyses.

Algorithm Details

AllToAll implements an all-versus-all sequence alignment algorithm using SketchObject.align() with AtomicInteger work distribution and lower-triangle computation:

Alignment Strategy

SketchObject Integration: Uses SketchObject.align() method for approximate sequence alignment between all pairs
Symmetric Matrix: Generates a symmetric identity matrix where entry (i,j) represents the sequence identity between sequence i and sequence j
Self-Alignment: Diagonal entries are automatically set to 1.0 (100% identity) representing perfect self-alignment
Half-Matrix Computation: Only computes the lower triangle of the matrix, then mirrors values to create the symmetric upper triangle

Multi-Threading Implementation

Thread Pool: Creates ProcessThread workers equal to the number of available threads
Work Distribution: Uses AtomicInteger counter for lock-free work distribution among threads
Query Processing: Each thread processes query sequences independently, performing alignments against all reference sequences with lower indices
Synchronized Output: Results are synchronized when writing to the shared results matrix

Memory and Performance

Memory Usage: Stores all input sequences in memory using ArrayList<Read> for direct array access during alignment
Space Complexity: O(n²) for the identity matrix where n is the number of sequences
Time Complexity: O(n²) pairwise comparisons, parallelized across available threads
Default Memory: Uses 4GB heap by default (-Xmx4g), automatically adjusted based on available system RAM

Output Format

Tab-Delimited Matrix: First row contains sequence names as column headers
Identity Scores: Values represent percentage identity (0-100) with 2 decimal places
Row Labels: Each data row starts with the sequence name
Symmetric Structure: Matrix is symmetric around the diagonal, with diagonal values always 100.00

Statistical Reporting

The tool reports processing statistics including total sequences processed, bases analyzed, number of alignments performed, and processing time with throughput metrics.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org