MuxByName

Basic Usage

muxbyname.sh in=<file,file,file...> out=<output file>

Input files may also be given without an in= prefix, allowing wildcards:

muxbyname.sh *.fastq out=muxed.fastq

Parameters

Parameters are organized into logical groups based on their function in the multiplexing process. MuxByName supports both single-end and paired-end reads with automatic format detection.

Standard parameters

in=<file,file>: A list of input files separated by commas. Can specify multiple FASTQ or FASTA files to be multiplexed together. Files can be gzipped.
in2=<file,file>: Read 2 input if reads are in paired files. Specify corresponding mate files for paired-end sequencing data.
out=<file>: Primary output, or read 1 output. All multiplexed reads will be written to this file with renamed headers.
out2=<file>: Read 2 output if reads are in paired files. Required when processing paired-end data with separate mate files.
overwrite=f: (ow) Set to false to force the program to abort rather than overwrite an existing file. Default: false (will not overwrite).
showspeed=t: (ss) Set to 'f' to suppress display of processing speed. Shows reads processed per second during execution. Default: true.
ziplevel=2: (zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster. Only applies to gzipped output files. Default: 2.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 400m.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions. Can slightly improve performance in production environments.

Examples

Basic Multiplexing

muxbyname.sh in=sample1.fq,sample2.fq,sample3.fq out=multiplexed.fq

Multiplexes three single-end FASTQ files into one output file. Each read will be renamed with its source filename as a prefix.

Using Wildcards

muxbyname.sh *.fastq out=all_samples.fastq

Multiplexes all FASTQ files in the current directory. Input files are specified without the in= prefix when using wildcards.

Paired-End Reads

muxbyname.sh in=sample1_R1.fq,sample2_R1.fq in2=sample1_R2.fq,sample2_R2.fq out=mux_R1.fq out2=mux_R2.fq

Multiplexes paired-end reads from multiple samples while maintaining mate pair relationships.

Hash Symbol Notation for Paired Files

muxbyname.sh sample1_#.fq,sample2_#.fq out=multiplexed_#.fq

Uses hash symbol (#) notation where # is automatically replaced with 1 and 2 for paired files. This is a shorthand for specifying paired-end files.

Compressed Files with Custom Settings

muxbyname.sh in=sample1.fq.gz,sample2.fq.gz out=muxed.fq.gz ziplevel=6 overwrite=t

Processes gzipped input files and creates gzipped output with higher compression level, allowing overwrite of existing output files.

Algorithm Details

MuxByName implements read multiplexing and renaming through the RenameAndMux.java class using multithreaded file processing:

Core Implementation

Main Processing Method: Uses renameAndMerge_MT() for multithreaded processing with thread pool architecture
Read Renaming Implementation: Applies pattern [core_filename]_[numeric_id] [1|2]: where core filename is extracted via ReadWrite.stripToCore(in1) and numeric IDs preserve original Read.numericID values
File Pattern Expansion: Handles # symbol replacement by converting path.replace("#", "1") and path.replace("#", "2") for paired-end files when original file doesn't exist
Stream Processing: Uses ConcurrentReadInputStream for input reading and ConcurrentReadOutputStream for output writing with ListNum<Read> buffering

Multithreading Architecture

Thread Management: Creates Shared.threads() number of MuxThread instances, each processing files via atomic counter nextPathNumber.getAndIncrement()
Work Distribution: Each MuxThread processes one file at a time using renameAndMergeOneFile() method with thread-safe file assignment
Output Synchronization: Uses single ConcurrentReadOutputStream shared across all threads with buffer size of 4 lists
Error State Management: Implements centralized errorState boolean with synchronized error tracking across threads

File Format Processing

Format Detection: Uses FileFormat.testInput() and FileFormat.testOutput() with FASTQ default format and automatic compression detection
Interleaved Handling: Controls FASTQ.FORCE_INTERLEAVED and FASTQ.TEST_INTERLEAVED flags based on input/output file count
Paired-End Logic: Automatically sets interleaved mode when out2!=null for single input stream, or non-interleaved when in2!=null for dual input streams
Compression Support: Leverages ReadWrite.USE_PIGZ=true with configurable zip threads via ReadWrite.setZipThreads((Shared.threads()*3+1)/4)

Memory and Performance

Memory Management: Uses streaming with AtomicLong counters (readsProcessedA, basesProcessedA) and buffered ListNum processing to minimize memory overhead
I/O Optimization: Implements ByteFile.FORCE_MODE_BF2=true when Shared.threads()>2 for enhanced threading performance
Buffer Configuration: Fixed buffer size of 4 lists in ConcurrentReadOutputStream.getStream() for consistent memory usage
Atomic Counters: Uses AtomicLong and AtomicInteger for thread-safe statistics tracking without synchronization overhead

Validation and Error Handling

Input Validation: Calls Tools.testInputFiles(false, true, in1, in2) to verify file readability before processing
Duplicate Prevention: Uses Tools.testForDuplicateFiles(true, in1, in2, out1, out2) to prevent file conflicts
Output Verification: Validates output writability via Tools.testOutputFiles(overwrite, false, false, out1, out2)
Progress Reporting: Implements Tools.timeReadsBasesProcessed() for real-time processing statistics

Relationship to DemuxByName: This tool performs the inverse operation of demuxbyname.sh. While demuxbyname splits multiplexed files based on read names, muxbyname combines separate files while adding identifying prefixes to read names using the core filename extraction and numeric ID preservation strategy implemented in the renameAndMergeOneFile() method.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org