A_Sample_MT

Script: a_sample_mt.sh Package: template Class: A_SampleMT.java

A template for creating multi-threaded read processing tools in BBTools. This tool does nothing by itself and is designed to be easily modified into a new program that processes reads in multiple threads.

Basic Usage

a_sample_mt.sh in=<input file> out=<output file>

Input may be fasta or fastq, compressed or uncompressed.

Parameters

Standard Parameters

in=<file>: Primary input, or read 1 input.
in2=<file>: Read 2 input if reads are in two files.
out=<file>: Primary output, or read 1 output.
out2=<file>: Read 2 output if reads are in two files.
ziplevel=2: Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.

Processing Parameters

No processing parameters defined yet. This is a template for creating new tools.

Java Parameters

-Xmx: Set Java's memory usage, overriding autodetection. Examples: -Xmx20g for 20 gigs of RAM, -Xmx200m for 200 megs. The max is typically 85% of physical memory.
-eoom: Cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Examples

Basic Usage with Single Input

a_sample_mt.sh in=input.fq out=output.fq

Process a single input file and write to an output file.

Paired-End Input

a_sample_mt.sh in=read1.fq in2=read2.fq out=output1.fq out2=output2.fq

Process paired-end reads from two input files and write to two output files.

Algorithm Details

A_SampleMT implements the Accumulator<ProcessThread> interface to provide a comprehensive multi-threaded read processing framework with the following technical architecture:

Thread Management

Thread Synchronization: Uses ReentrantReadWriteLock for thread-safe accumulation of processing statistics across concurrent ProcessThread instances
Thread Spawning: Creates ArrayList<ProcessThread> with Shared.threads() worker threads, each processing reads independently via ThreadWaiter.startAndWait()
Buffer Management: Configures output buffer size using Tools.mid(16, 128, (Shared.threads()*2)/3) for optimal memory usage based on thread count

Stream Processing Architecture

Input Streams: ConcurrentReadInputStream handles ListNum<Read> distribution to worker threads with automatic format detection via FileFormat.testInput()
Output Streams: ConcurrentReadOutputStream manages thread-safe write operations with configurable ordering and buffering
File I/O Optimization: Enables ByteFile.FORCE_MODE_BF2 when Shared.threads() > 2 for parallel file reading performance

Processing Workflow

processInner(): Each ProcessThread calls cris.nextList() to retrieve ListNum<Read> batches
processList(): Iterates through ArrayList<Read> within each batch, handling paired-end read validation
processReadPair(): Abstract method where custom processing logic is implemented (currently throws RuntimeException as template)
Statistics Accumulation: Atomic updates to readsProcessedT, basesProcessedT, readsOutT, basesOutT per thread

Read Validation Strategy

Conditional Validation: Sets Read.VALIDATE_IN_CONSTRUCTOR based on thread count (disabled when Shared.threads() ≥ 4 for performance)
Worker Thread Validation: Each read pair validated via r1.validate(true) and r2.validate(true) in ProcessThread context
Paired-End Handling: Automatic mate detection and processing for Read.mate relationships

Implementation Note: To create a functional tool, developers must implement the processReadPair(Read r1, Read r2) method in the ProcessThread inner class, replacing the placeholder RuntimeException with actual read processing logic.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org