Translate6Frames

Script: translate6frames.sh Package: jgi Class: TranslateSixFrames.java

Translates nucleotide sequences to all 6 amino acid frames, or amino acids to a canonical nucleotide representation. Input may be fasta or fastq, compressed or uncompressed.

Basic Usage

translate6frames.sh in=<input file> out=<output file>

This tool can operate in two modes: translating nucleotide sequences to amino acids (6-frame translation) or converting amino acids back to canonical nucleotide representations.

Parameters

Parameters are organized by function into input handling, output formatting, and Java runtime configuration.

Input parameters

in=<file>
Main input file. Use in=stdin.fa to pipe from stdin. Accepts fasta or fastq format, compressed or uncompressed.
in2=<file>
Input for 2nd read of pairs in a different file. Used for paired-end data stored in separate files.
int=auto
(interleaved) Set to t/f to override interleaved autodetection. Auto-detection examines file structure to determine if reads are interleaved.
qin=auto
Input quality offset: 33 (Sanger), 64 (Illumina 1.3+), or auto. Auto-detection examines quality scores to determine encoding.
aain=f
Set to true if input sequences are amino acids instead of nucleotides. When true, performs reverse translation to canonical nucleotides.
reads=-1
If positive, quit after processing this many reads or pairs. Useful for testing or processing subsets of large files.

Output parameters

out=<file>
Write output here. Use 'out=stdout.fa' to write to standard output. Output format matches input format unless overridden.
out2=<file>
Use this to write 2nd read of pairs to a different file. Required when input has paired reads in separate files.
overwrite=t
(ow) Grant permission to overwrite existing output files. Set to false to prevent accidental overwrites.
append=f
Append to existing files instead of overwriting. Useful for combining results from multiple runs.
ziplevel=2
(zl) Compression level for gzipped output; 1 (fastest compression) through 9 (best compression). Higher values use more CPU time.
fastawrap=80
Length of lines in fasta output. Sequences longer than this value are wrapped to multiple lines.
qout=auto
Output quality offset: 33 (Sanger), 64 (Illumina 1.3+), or auto. Auto uses same encoding as input.
aaout=t
Set to false to output nucleotides, true for amino acids. When translating nucleotides, this determines final output format.
tag=t
Tag read ID with the frame number, adding suffixes like ' fr1', ' fr2', etc. Helps identify which frame each translated sequence came from.
frames=6
Only print this many frames (1-6). If you already know the correct reading frame, set 'frames=3' to translate only forward frames. Default 6 translates all forward and reverse frames.

Java Parameters

-Xmx
Set Java's memory usage, overriding autodetection. -Xmx20g specifies 20 gigabytes of RAM, -Xmx200m specifies 200 megabytes. The maximum is typically 85% of physical memory.
-eoom
Exit if an out-of-memory exception occurs. Requires Java 8u92 or later. Prevents incomplete output when memory is exhausted.
-da
Disable Java assertions. May provide minor performance improvement in production use.

Examples

Basic 6-frame Translation

translate6frames.sh in=genes.fasta out=proteins.fasta

Translates nucleotide sequences to amino acids in all 6 reading frames. Each input sequence generates 6 output sequences tagged with frame identifiers (fr1-fr6).

Forward Frames Only

translate6frames.sh in=orfs.fasta out=proteins.fasta frames=3

Translates only the three forward reading frames (fr1-fr3), useful when the strand orientation is known.

Reverse Translation

translate6frames.sh in=proteins.fasta out=nucleotides.fasta aain=t aaout=f

Converts amino acid sequences back to canonical nucleotide representations using the genetic code.

Paired-End Processing

translate6frames.sh in=reads_1.fq in2=reads_2.fq out=proteins_1.fq out2=proteins_2.fq

Processes paired-end reads, translating both read 1 and read 2 to amino acids while maintaining pairing information.

No Frame Tagging

translate6frames.sh in=sequences.fasta out=translated.fasta tag=f

Translates sequences without adding frame identifiers to sequence names, producing cleaner output for downstream analysis.

Algorithm Details

Translation Implementation

The core translation functionality is implemented in the toFrames() method (lines 341-359), which performs nucleotide-to-amino acid conversion using the AminoAcid class methods:

Bidirectional Processing Architecture

The tool implements bidirectional translation controlled by boolean flags NT_IN and NT_OUT (lines 387-388):

Memory and Performance Architecture

The implementation uses concurrent streaming patterns for memory efficiency:

Quality Score Processing Details

Quality score handling follows specific compression algorithms:

Input/Output Stream Management

File handling uses the BBTools streaming architecture:

Support

For questions and support: