ReplaceHeaders

Script: replaceheaders.sh Package: jgi Class: ReplaceHeaders.java

Replaces read names with names from another file. The other file can either be sequences or simply names, with one name per line (and no > or @ symbols). If you use one name per line, please give the file a .header extension.

Basic Usage

replaceheaders.sh in=<file> hin=<headers file> out=<out file>

This tool replaces the headers (names) of sequences in one file with headers from another file. The input and header files must have the same number of sequences in the same order.

Parameters

Parameters are organized by function. The tool supports standard BBTools I/O parameters along with specific header replacement options.

Parameters

in=
Input sequences file. Use in2 for a second paired file. This is the file whose headers will be replaced.
hin=
Header input sequences file. Use hin2 for a second paired file. This file contains the replacement headers. Can be sequences or plain text with one name per line (use .header extension for plain text).
out=
Output sequences file. Use out2 for a second paired file. Will contain the original sequences with the new headers.
ow=f
(overwrite) Overwrites files that already exist. Set to true to allow overwriting existing output files.
zl=4
(ziplevel) Set compression level, 1 (low) to 9 (max). Controls gzip compression level for output files.
int=f
(interleaved) Determines whether INPUT file is considered interleaved. Set to true if input contains paired reads in a single file.
fastawrap=70
Length of lines in fasta output. Controls line wrapping for FASTA format output sequences.
qin=auto
ASCII offset for input quality scores. May be 33 (Sanger), 64 (Illumina), or auto to detect automatically.
qout=auto
ASCII offset for output quality scores. May be 33 (Sanger), 64 (Illumina), or auto (same as input).

Renaming mode parameters (if not default)

addprefix=f
Rename the read by prepending the new name to the existing name. When true, combines new and old headers (new_header old_header). When false (default), completely replaces the old header.

Sampling parameters

reads=-1
Set to a positive number to only process this many INPUT reads (or pairs), then quit. Use -1 (default) to process all reads.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions. May provide slight performance improvement in production use.

Examples

Basic Header Replacement

replaceheaders.sh in=reads.fq hin=newnames.fq out=renamed_reads.fq

Replaces all headers in reads.fq with headers from newnames.fq, writing output to renamed_reads.fq.

Using Plain Text Header File

replaceheaders.sh in=sequences.fa hin=names.header out=renamed.fa

Uses a plain text file (names.header) containing one name per line to replace sequence headers. The .header extension tells the tool to treat it as plain text rather than sequence format.

Paired-end Files

replaceheaders.sh in=reads_1.fq in2=reads_2.fq hin=names_1.fq hin2=names_2.fq out=renamed_1.fq out2=renamed_2.fq

Processes paired-end files, replacing headers in both R1 and R2 files with corresponding headers from the header files.

Prefix Mode

replaceheaders.sh in=reads.fq hin=prefixes.header out=prefixed_reads.fq addprefix=t

Prepends new names to existing headers instead of replacing them completely. Results in headers like "newname originalname".

Limited Processing

replaceheaders.sh in=large_dataset.fq hin=new_names.fq out=sample.fq reads=1000

Only processes the first 1000 reads from the input file, useful for testing or sampling large datasets.

Algorithm Details

ReplaceHeaders uses ConcurrentReadInputStream with synchronized dual-stream processing to handle sequence and header files simultaneously:

Processing Strategy

Header Replacement Logic

File Format Support

Performance Characteristics

Error Handling

Support

For questions and support: