A_Sample_MT
A template for creating multi-threaded read processing tools in BBTools. This tool does nothing by itself and is designed to be easily modified into a new program that processes reads in multiple threads.
Basic Usage
a_sample_mt.sh in=<input file> out=<output file>
Input may be fasta or fastq, compressed or uncompressed.
Parameters
Standard Parameters
- in=<file>
- Primary input, or read 1 input.
- in2=<file>
- Read 2 input if reads are in two files.
- out=<file>
- Primary output, or read 1 output.
- out2=<file>
- Read 2 output if reads are in two files.
- ziplevel=2
- Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.
Processing Parameters
No processing parameters defined yet. This is a template for creating new tools.
Java Parameters
- -Xmx
- Set Java's memory usage, overriding autodetection. Examples: -Xmx20g for 20 gigs of RAM, -Xmx200m for 200 megs. The max is typically 85% of physical memory.
- -eoom
- Cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Examples
Basic Usage with Single Input
a_sample_mt.sh in=input.fq out=output.fq
Process a single input file and write to an output file.
Paired-End Input
a_sample_mt.sh in=read1.fq in2=read2.fq out=output1.fq out2=output2.fq
Process paired-end reads from two input files and write to two output files.
Algorithm Details
A_SampleMT implements the Accumulator<ProcessThread> interface to provide a comprehensive multi-threaded read processing framework with the following technical architecture:
Thread Management
- Thread Synchronization: Uses ReentrantReadWriteLock for thread-safe accumulation of processing statistics across concurrent ProcessThread instances
- Thread Spawning: Creates ArrayList<ProcessThread> with Shared.threads() worker threads, each processing reads independently via ThreadWaiter.startAndWait()
- Buffer Management: Configures output buffer size using Tools.mid(16, 128, (Shared.threads()*2)/3) for optimal memory usage based on thread count
Stream Processing Architecture
- Input Streams: ConcurrentReadInputStream handles ListNum<Read> distribution to worker threads with automatic format detection via FileFormat.testInput()
- Output Streams: ConcurrentReadOutputStream manages thread-safe write operations with configurable ordering and buffering
- File I/O Optimization: Enables ByteFile.FORCE_MODE_BF2 when Shared.threads() > 2 for parallel file reading performance
Processing Workflow
- processInner(): Each ProcessThread calls cris.nextList() to retrieve ListNum<Read> batches
- processList(): Iterates through ArrayList<Read> within each batch, handling paired-end read validation
- processReadPair(): Abstract method where custom processing logic is implemented (currently throws RuntimeException as template)
- Statistics Accumulation: Atomic updates to readsProcessedT, basesProcessedT, readsOutT, basesOutT per thread
Read Validation Strategy
- Conditional Validation: Sets Read.VALIDATE_IN_CONSTRUCTOR based on thread count (disabled when Shared.threads() ≥ 4 for performance)
- Worker Thread Validation: Each read pair validated via r1.validate(true) and r2.validate(true) in ProcessThread context
- Paired-End Handling: Automatic mate detection and processing for Read.mate relationships
Implementation Note: To create a functional tool, developers must implement the processReadPair(Read r1, Read r2) method in the ProcessThread inner class, replacing the placeholder RuntimeException with actual read processing logic.
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org