ReplaceHeaders

Basic Usage

replaceheaders.sh in=<file> hin=<headers file> out=<out file>

This tool replaces the headers (names) of sequences in one file with headers from another file. The input and header files must have the same number of sequences in the same order.

Parameters

Parameters are organized by function. The tool supports standard BBTools I/O parameters along with specific header replacement options.

Parameters

in=: Input sequences file. Use in2 for a second paired file. This is the file whose headers will be replaced.
hin=: Header input sequences file. Use hin2 for a second paired file. This file contains the replacement headers. Can be sequences or plain text with one name per line (use .header extension for plain text).
out=: Output sequences file. Use out2 for a second paired file. Will contain the original sequences with the new headers.
ow=f: (overwrite) Overwrites files that already exist. Set to true to allow overwriting existing output files.
zl=4: (ziplevel) Set compression level, 1 (low) to 9 (max). Controls gzip compression level for output files.
int=f: (interleaved) Determines whether INPUT file is considered interleaved. Set to true if input contains paired reads in a single file.
fastawrap=70: Length of lines in fasta output. Controls line wrapping for FASTA format output sequences.
qin=auto: ASCII offset for input quality scores. May be 33 (Sanger), 64 (Illumina), or auto to detect automatically.
qout=auto: ASCII offset for output quality scores. May be 33 (Sanger), 64 (Illumina), or auto (same as input).

Renaming mode parameters (if not default)

addprefix=f: Rename the read by prepending the new name to the existing name. When true, combines new and old headers (new_header old_header). When false (default), completely replaces the old header.

Sampling parameters

reads=-1: Set to a positive number to only process this many INPUT reads (or pairs), then quit. Use -1 (default) to process all reads.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions. May provide slight performance improvement in production use.

Examples

Basic Header Replacement

replaceheaders.sh in=reads.fq hin=newnames.fq out=renamed_reads.fq

Replaces all headers in reads.fq with headers from newnames.fq, writing output to renamed_reads.fq.

Using Plain Text Header File

replaceheaders.sh in=sequences.fa hin=names.header out=renamed.fa

Uses a plain text file (names.header) containing one name per line to replace sequence headers. The .header extension tells the tool to treat it as plain text rather than sequence format.

Paired-end Files

replaceheaders.sh in=reads_1.fq in2=reads_2.fq hin=names_1.fq hin2=names_2.fq out=renamed_1.fq out2=renamed_2.fq

Processes paired-end files, replacing headers in both R1 and R2 files with corresponding headers from the header files.

Prefix Mode

replaceheaders.sh in=reads.fq hin=prefixes.header out=prefixed_reads.fq addprefix=t

Prepends new names to existing headers instead of replacing them completely. Results in headers like "newname originalname".

Limited Processing

replaceheaders.sh in=large_dataset.fq hin=new_names.fq out=sample.fq reads=1000

Only processes the first 1000 reads from the input file, useful for testing or sampling large datasets.

Algorithm Details

ReplaceHeaders uses ConcurrentReadInputStream with synchronized dual-stream processing to handle sequence and header files simultaneously:

Processing Strategy

ConcurrentReadInputStream Architecture: Creates two independent ConcurrentReadInputStream instances (cris for sequences, hcris for headers) with parallel ListNum processing for thread-safe batch operations
Pairedness Validation: Compares cris.paired() and hcris.paired() states in constructor, terminating with KillSwitch.kill() if read and header files have mismatched pairedness
ListNum Synchronization: Uses matching ListNum sizes between sequence and header streams, enforcing count equality with KillSwitch.kill() when hreads.size() != reads.size()
Buffer Management: Implements Shared.setBufferLen(1) with Shared.capBuffers(4) for optimized memory allocation and concurrent stream processing

Header Replacement Logic

processReadPair() Method: Core replacement logic in single method handling both prefix and direct replacement modes
Direct Replacement (prefix=false): Simple assignment r1.id=h1.id and r2.id=h2.id for complete header substitution
Prefix Mode (prefix=true): String concatenation r1.id=h1.id+" "+r1.id with space separator between new and original headers
Paired-end Consistency: Handles both r1/r2 and h1/h2 pairs simultaneously, maintaining mate relationship structure

File Format Support

FileFormat Detection: Uses FileFormat.testInput() with FASTQ and HEADER format constants for automatic format recognition
Header File Processing: FileFormat.HEADER type handles .header extension files as plain text with single name per line parsing
Stream Integration: FastaReadInputStream with ConcurrentReadInputStream wrapper provides unified sequence parsing across FASTA/FASTQ formats
Quality Score Processing: Parser.processQuality() handles qin/qout ASCII offset conversion for FASTQ quality preservation

Performance Characteristics

O(n) Processing: Single-pass through both files with processInner() method iterating once through all reads
Streaming Memory Model: Uses ListNum batches with cris.nextList() and hcris.nextList() for constant memory footprint regardless of file size
Concurrent I/O: ConcurrentReadOutputStream.getStream() with configurable buffer=4 enables overlapped read/write operations
Compression Handling: ReadWrite.USE_PIGZ=true enables parallel gzip with ReadWrite.setZipThreads() for transparent compression support

Error Handling

KillSwitch Integration: Uses KillSwitch.kill() for immediate termination on count mismatches or pairedness validation failures
Tools.testInputFiles(): Pre-validates all input files exist and are readable before stream creation
Tools.testOutputFiles(): Verifies output file write permissions with overwrite/append flag handling
Tools.testForDuplicateFiles(): Prevents file conflicts by checking for duplicate file specifications across inputs/outputs

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org