Rename
Renames reads to <prefix>_<number> where you specify the prefix and the numbers are ordered. There are other renaming modes too. If reads are paired, pairs should be processed together; if reads are interleaved, the interleaved flag should be set. This ensures that if a read number (such as 1: or 2:) is added, it will be added correctly.
Basic Usage
rename.sh in=<file> in2=<file2> out=<outfile> out2=<outfile2> prefix=<prefix>
in2 and out2 are for paired reads and are optional. If input is paired and there is only one output file, it will be written interleaved.
Parameters
Parameters are organized by function. The tool supports sequential numbering, custom prefixes, coordinate-based renaming, and header trimming operations.
Parameters
- prefix=
- The string to prepend to existing read names. When combined with sequential numbering, creates names like "prefix_1", "prefix_2", etc.
- suffix=
- If a suffix is supplied, it will be appended to the existing read name, after a tab character. Useful for adding metadata.
- ow=f
- (overwrite) Overwrites files that already exist. Default: false
- zl=4
- (ziplevel) Set compression level, 1 (low) to 9 (max). Default: 4
- int=f
- (interleaved) Determines whether INPUT file is considered interleaved. Default: false
- fastawrap=70
- Length of lines in fasta output. Default: 70 characters per line
- minscaf=1
- Ignore fasta reads shorter than this length. Default: 1
- qin=auto
- ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or auto. Default: auto
- qout=auto
- ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or auto (same as input). Default: auto
- ignorebadquality=f
- (ibq) Fix out-of-range quality values instead of crashing with a warning. Default: false
Renaming Mode Parameters (if not default)
- renamebyinsert=f
- Rename the read to indicate its correct insert size. Uses prefix="insert=" and adds calculated insert size to read names. Default: false
- renamebymapping=f
- Rename the read to indicate its correct mapping coordinates. Requires fastq output format. Default: false
- renamebytrim=f
- Rename the read to indicate its correct post-trimming length. Creates names like "ID_readlength_insertlength". Default: false
- renamebycoords=f
- Rename Illumina headers to leave coordinates but remove redundant info. Extracts coordinate information from Illumina headers. Default: false
- addprefix=f
- Rename the read by prepending the prefix to the existing name, keeping the original name intact. Default: false
- prefixonly=f
- Only use the prefix; don't add _<number> sequential numbering. All reads will have identical names. Default: false
- addunderscore=t
- Add an underscore after the prefix (if there is a prefix). Only applies when not using prefixonly mode. Default: true
- addpairnum=t
- Add a pairnum (e.g. ' 1:', ' 2:') to paired reads in some modes. Helps distinguish read pairs. Default: true
- fixsra=f
- Fixes headers of SRA reads renamed from Illumina. Specifically, it converts something like this: "SRR17611.11 HWI-ST79:17:D091UACXX:4:1101:210:824 length=75" into this: "HWI-ST79:17:D091UACXX:4:1101:210:824 1:". Default: false
Trimming Parameters
- trimleft=0
- Trim this many characters from the header start. Applied before other renaming operations. Default: 0
- trimright=0
- Trim this many characters from the header end. Applied before other renaming operations. Default: 0
- trimbeforesymbol=0
- Trim this many characters before the last instance of a specified symbol. Used with symbol parameter. Default: 0
- symbol=
- Trim before this symbol. This can be a literal like ':' or a word like tab or lessthan for reserved symbols. Works with trimbeforesymbol parameter.
Other Parameters
- reads=-1
- Set to a positive number to only process this many INPUT reads (or pairs), then quit. Default: -1 (process all reads)
- quantize=
- Set this to reduce compressed file size by binning quality scores. E.g., quantize=2 will eliminate odd qscores, keeping only even values.
Java Parameters
- -Xmx
- This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions. May provide minor performance improvement in production use.
Examples
Basic Sequential Renaming
rename.sh in=reads.fq out=renamed.fq prefix=sample
Renames reads to sample_1, sample_2, sample_3, etc.
Paired-End Renaming
rename.sh in1=reads_1.fq in2=reads_2.fq out1=renamed_1.fq out2=renamed_2.fq prefix=experiment
Renames paired reads to experiment_1 1:, experiment_1 2:, experiment_2 1:, experiment_2 2:, etc.
Add Prefix to Existing Names
rename.sh in=reads.fq out=prefixed.fq prefix=lib1_ addprefix=t
Prepends "lib1_" to existing read names, preserving original identifiers.
Trim Headers
rename.sh in=reads.fq out=trimmed.fq trimleft=5 trimright=10
Removes 5 characters from the start and 10 characters from the end of each header.
Fix SRA Headers
rename.sh in=sra_reads.fq out=fixed.fq fixsra=t
Converts SRA-style headers back to original Illumina format with proper pair numbering.
Rename by Insert Size
rename.sh in1=reads_1.fq in2=reads_2.fq out=inserts.fq renamebyinsert=t
Renames reads to indicate their calculated insert sizes (requires paired reads).
Algorithm Details
Renaming Strategy
The rename tool implements several distinct renaming strategies that can be selected via parameters:
Sequential Numbering (Default)
The default mode assigns sequential numbers to reads in the format prefix_number. For paired reads, both mates receive the same number with different pair identifiers (1: and 2:). The numbering counter increments only after processing both mates of a pair.
Insert Size-Based Renaming
When renamebyinsert=true, the tool calculates insert sizes for paired reads and incorporates this information into read names. The algorithm uses Read.insertSizeMapped() to determine the distance between paired reads, providing valuable size information directly in the identifier.
Coordinate-Based Renaming
The renamebycoords mode uses IlluminaHeaderParser2 to extract coordinate information from Illumina headers. It reconstructs minimal coordinate strings in the format that preserves essential positioning data while removing redundant information.
Trim-Based Renaming
The renamebytrim mode creates informative names containing the read length and calculated insert size, using the format: numericID_readlength_insertsize. This provides immediate access to size metrics without requiring separate analysis.
Header Processing
The tool provides header trimming capabilities:
- Left/Right Trimming: Removes specified numbers of characters from header start or end
- Symbol-Based Trimming: Uses trimBeforeSymbol() method to remove characters before the last occurrence of a specified symbol
- SRA Header Fixing: Parses SRA-format headers using regex patterns to extract original Illumina identifiers
Memory Efficiency
The rename operation processes reads in streaming fashion using ConcurrentReadInputStream and ConcurrentReadOutputStream. This approach maintains constant memory usage regardless of input file size, making it suitable for large-scale sequencing datasets.
Quality Score Handling
When quantizeQuality is enabled, the tool applies Quantizer.quantize() to reduce quality score precision. This can significantly reduce compressed file sizes by eliminating fine-grained quality differences while preserving essential quality information.
Paired-Read Awareness
The tool maintains proper paired-read relationships throughout all renaming operations. It ensures that paired reads receive coordinated names and applies pair-specific suffixes (" 1:" and " 2:") when appropriate. The pair numbering system uses the pairnum() method to correctly identify which read in a pair is being processed.
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org