MuxByName
Multiplexes reads from multiple files after renaming them based on their initial file. Opposite of demuxbyname.
Basic Usage
muxbyname.sh in=<file,file,file...> out=<output file>
Input files may also be given without an in= prefix, allowing wildcards:
muxbyname.sh *.fastq out=muxed.fastq
Parameters
Parameters are organized into logical groups based on their function in the multiplexing process. MuxByName supports both single-end and paired-end reads with automatic format detection.
Standard parameters
- in=<file,file>
- A list of input files separated by commas. Can specify multiple FASTQ or FASTA files to be multiplexed together. Files can be gzipped.
- in2=<file,file>
- Read 2 input if reads are in paired files. Specify corresponding mate files for paired-end sequencing data.
- out=<file>
- Primary output, or read 1 output. All multiplexed reads will be written to this file with renamed headers.
- out2=<file>
- Read 2 output if reads are in paired files. Required when processing paired-end data with separate mate files.
- overwrite=f
- (ow) Set to false to force the program to abort rather than overwrite an existing file. Default: false (will not overwrite).
- showspeed=t
- (ss) Set to 'f' to suppress display of processing speed. Shows reads processed per second during execution. Default: true.
- ziplevel=2
- (zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster. Only applies to gzipped output files. Default: 2.
Java Parameters
- -Xmx
- This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 400m.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions. Can slightly improve performance in production environments.
Examples
Basic Multiplexing
muxbyname.sh in=sample1.fq,sample2.fq,sample3.fq out=multiplexed.fq
Multiplexes three single-end FASTQ files into one output file. Each read will be renamed with its source filename as a prefix.
Using Wildcards
muxbyname.sh *.fastq out=all_samples.fastq
Multiplexes all FASTQ files in the current directory. Input files are specified without the in= prefix when using wildcards.
Paired-End Reads
muxbyname.sh in=sample1_R1.fq,sample2_R1.fq in2=sample1_R2.fq,sample2_R2.fq out=mux_R1.fq out2=mux_R2.fq
Multiplexes paired-end reads from multiple samples while maintaining mate pair relationships.
Hash Symbol Notation for Paired Files
muxbyname.sh sample1_#.fq,sample2_#.fq out=multiplexed_#.fq
Uses hash symbol (#) notation where # is automatically replaced with 1 and 2 for paired files. This is a shorthand for specifying paired-end files.
Compressed Files with Custom Settings
muxbyname.sh in=sample1.fq.gz,sample2.fq.gz out=muxed.fq.gz ziplevel=6 overwrite=t
Processes gzipped input files and creates gzipped output with higher compression level, allowing overwrite of existing output files.
Algorithm Details
MuxByName implements read multiplexing and renaming through the RenameAndMux.java class using multithreaded file processing:
Core Implementation
- Main Processing Method: Uses renameAndMerge_MT() for multithreaded processing with thread pool architecture
- Read Renaming Implementation: Applies pattern
[core_filename]_[numeric_id] [1|2]:
where core filename is extracted via ReadWrite.stripToCore(in1) and numeric IDs preserve original Read.numericID values - File Pattern Expansion: Handles # symbol replacement by converting path.replace("#", "1") and path.replace("#", "2") for paired-end files when original file doesn't exist
- Stream Processing: Uses ConcurrentReadInputStream for input reading and ConcurrentReadOutputStream for output writing with ListNum<Read> buffering
Multithreading Architecture
- Thread Management: Creates Shared.threads() number of MuxThread instances, each processing files via atomic counter nextPathNumber.getAndIncrement()
- Work Distribution: Each MuxThread processes one file at a time using renameAndMergeOneFile() method with thread-safe file assignment
- Output Synchronization: Uses single ConcurrentReadOutputStream shared across all threads with buffer size of 4 lists
- Error State Management: Implements centralized errorState boolean with synchronized error tracking across threads
File Format Processing
- Format Detection: Uses FileFormat.testInput() and FileFormat.testOutput() with FASTQ default format and automatic compression detection
- Interleaved Handling: Controls FASTQ.FORCE_INTERLEAVED and FASTQ.TEST_INTERLEAVED flags based on input/output file count
- Paired-End Logic: Automatically sets interleaved mode when out2!=null for single input stream, or non-interleaved when in2!=null for dual input streams
- Compression Support: Leverages ReadWrite.USE_PIGZ=true with configurable zip threads via ReadWrite.setZipThreads((Shared.threads()*3+1)/4)
Memory and Performance
- Memory Management: Uses streaming with AtomicLong counters (readsProcessedA, basesProcessedA) and buffered ListNum processing to minimize memory overhead
- I/O Optimization: Implements ByteFile.FORCE_MODE_BF2=true when Shared.threads()>2 for enhanced threading performance
- Buffer Configuration: Fixed buffer size of 4 lists in ConcurrentReadOutputStream.getStream() for consistent memory usage
- Atomic Counters: Uses AtomicLong and AtomicInteger for thread-safe statistics tracking without synchronization overhead
Validation and Error Handling
- Input Validation: Calls Tools.testInputFiles(false, true, in1, in2) to verify file readability before processing
- Duplicate Prevention: Uses Tools.testForDuplicateFiles(true, in1, in2, out1, out2) to prevent file conflicts
- Output Verification: Validates output writability via Tools.testOutputFiles(overwrite, false, false, out1, out2)
- Progress Reporting: Implements Tools.timeReadsBasesProcessed() for real-time processing statistics
Relationship to DemuxByName: This tool performs the inverse operation of demuxbyname.sh. While demuxbyname splits multiplexed files based on read names, muxbyname combines separate files while adding identifying prefixes to read names using the core filename extraction and numeric ID preservation strategy implemented in the renameAndMergeOneFile() method.
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org