AllToAll

Script: alltoall.sh Package: aligner Class: AllToAll.java

Aligns all to all to produce an identity matrix.

Basic Usage

alltoall.sh in=<input file> out=<output file>

Input may be fasta or fastq, compressed or uncompressed. The tool performs pairwise alignment between all input sequences and outputs a symmetric identity matrix showing sequence similarity scores.

Parameters

Parameters control input/output settings, threading, and memory management for the all-to-all alignment process.

Standard parameters

in=<file>
Input sequences. Accepts FASTA or FASTQ format, compressed or uncompressed.
out=<file>
Output data. Tab-delimited identity matrix with sequence names as headers and percentage identity values (0-100).
t=
Set the number of threads; default is logical processors. Multi-threading uses AtomicInteger work distribution across ProcessThread workers.
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file.
showspeed=t
(ss) Set to 'f' to suppress display of processing speed.
ziplevel=2
(zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression uses less CPU time.
reads=-1
If positive, quit after this many sequences. Useful for testing with subset of large datasets.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Basic All-to-All Alignment

alltoall.sh in=sequences.fasta out=identity_matrix.txt

Performs pairwise alignment between all sequences in sequences.fasta and outputs an identity matrix to identity_matrix.txt.

Multi-threaded Processing

alltoall.sh in=large_dataset.fq out=results.txt t=8

Uses 8 threads to process a large FASTQ dataset with AtomicInteger work distribution across ProcessThread workers.

Subset Analysis

alltoall.sh in=sequences.fa out=subset_matrix.txt reads=100

Processes only the first 100 sequences from the input file, useful for testing or smaller analyses.

Algorithm Details

AllToAll implements an all-versus-all sequence alignment algorithm using SketchObject.align() with AtomicInteger work distribution and lower-triangle computation:

Alignment Strategy

Multi-Threading Implementation

Memory and Performance

Output Format

Statistical Reporting

The tool reports processing statistics including total sequences processed, bases analyzed, number of alignments performed, and processing time with throughput metrics.

Support

For questions and support: