BBTools Universal Parameters

Applies to: Most BBTools programs Source: shared/Parser.java, shared/PreParser.java

Universal parameters that work across most BBTools programs, providing consistent behavior for memory management, threading, input/output operations, compression settings, quality control, and more. These parameters can be used with any BBTools program unless specifically noted otherwise.

Overview

BBTools programs share a common set of parameters that provide consistent functionality across the entire suite. Understanding these universal parameters allows you to effectively use any BBTools program and create robust data processing pipelines.

Parameter Syntax

All parameters use the format parameter=value
Parameters are case-insensitive for names, case-sensitive for values
Boolean parameters accept t/f, true/false, or null (=true)
File paths containing spaces should be quoted
Use null to explicitly set a parameter to empty/default

Parameters

Memory Management

-Xmx<size>: Sets maximum heap memory for Java. Examples: -Xmx4g (4 gigabytes), -Xmx200m (200 megabytes). Default is auto-detected from available system memory. Use approximately 85% of physical RAM for optimal performance.
-Xms<size>: Sets initial heap memory for Java. Usually not needed as JVM will grow memory as required. Example: -Xms1g
-ea: Enable assertions. Default behavior - causes programs to crash if they detect internal errors, helping identify bugs or data corruption.
-da: Disable assertions. Ignore internal error checking. May provide slight speed increase but could mask problems. Use with caution.
-eoom: Exit on out-of-memory error. Available in Java 8u92+. Cleanly exits the program when memory is exhausted rather than hanging.

Threading & Performance

threads=auto: (t) Number of worker threads to use. Default auto-detects all available logical processors. Set lower on shared systems. Examples: t=4, threads=8
monitor=f: Launch memory/CPU monitoring and kill process if it hangs. monitor=600,0.01 kills after 600 seconds under 1% CPU usage. Useful for batch jobs.
usejni=f: (jni) Enable JNI-accelerated versions of BBMerge, BBMap, and Dedupe. Requires compilation of C code. Provides speed improvements.
usempi=f: (mpi) Enable MPI support for distributed processing. Most programs are not currently MPI-capable. Can also specify number of ranks: mpi=4
readbufferlength=200: (readbufferlen) Number of reads per ListNum (work unit). Affects memory usage and load balancing between threads.
readbuffers=auto: Number of ListNums to buffer. Default is 150% of thread count. Higher values use more memory but improve thread utilization.

Input/Output Parameters

in=<file>: (in1, input, input1) Primary input file. Use stdin.fq to read from standard input. Supports FASTA, FASTQ, SAM, compressed files.
in2=<file>: (input2) Secondary input file for paired reads. Use when reads are in separate files rather than interleaved.
out=<file>: (out1, output, output1) Primary output file. Use stdout.fq to write to standard output. Format auto-detected from extension.
out2=<file>: (output2) Secondary output file for paired reads. Used when writing separate files instead of interleaved output.
overwrite=t: (ow) Allow overwriting existing files. Set to false to prevent accidental overwriting of important files.
append=f: (app) Append to existing files instead of overwriting. Useful for combining results from multiple runs.
ordered=f: Output reads in same order as input. Slight performance penalty but maintains read order for reproducible results.
interleaved=auto: (int) Force interleaved mode. t=interleaved, f=not interleaved, auto=autodetect. Must be set when piping FASTQ through stdin.
reads=-1: (maxreads) Stop after processing this many reads/pairs. Useful for testing or processing subsets. -1 means no limit.
samplerate=1: Randomly sample this fraction of reads. 0.5 processes half the reads. Useful for creating test datasets or reducing computation.
sampleseed=-1: Random seed for sampling. Positive values enable reproducible random sampling. -1 uses current time.
extin=: Override input file format detection. Force interpretation as specific format regardless of filename.
extout=: Override output file format. Force specific output format regardless of filename.

Quality Encoding

qin=auto: (qualityin, asciiin) Input quality encoding. 33 (Sanger), 64 (Illumina), or auto-detect. Most modern data uses 33.
qout=auto: (qualityout, asciiout) Output quality encoding. 33 (Sanger), 64 (Illumina), or auto (same as input). Usually 33.
ignorebadquality=f: (ibq) Don't crash on quality values that appear incorrect. Useful for handling data with quality issues.
maxcalledquality=41: Cap quality scores at this upper limit. Prevents extremely high quality values that may cause issues.
mincalledquality=0: Set quality scores below this to this value. Prevents negative quality scores.
fakequality=30: (qfake) Quality value to use when converting FASTA to FASTQ format.

Compression

ziplevel=2: (zl) Compression level for gzip output. 1 (fastest) to 9 (smallest). 2 provides good balance of speed and compression.
pigz=auto: Use parallel gzip for compression. t=enable, f=disable, number=use exactly that many threads. Faster than standard gzip.
unpigz=auto: Use parallel gzip for decompression. Generally provides moderate speed improvement for reading compressed files.
usegzip=t: (gzip) Enable gzip compression support. Usually not necessary to change.
usebzip2=t: (bzip2) Enable bzip2 compression support. Slower but better compression than gzip.
usepbzip2=t: (pbzip2) Use parallel bzip2 if available. Faster than standard bzip2.

Basic Filtering

minlen=0: (ml, minlength) Discard reads shorter than this after processing. Essential for removing short fragments.
maxlen=: (maxlength, maxreadlength) Discard reads longer than this after processing. Can also break long reads into pieces.
mingc=0: Discard reads with GC content below this fraction (0-1). Example: mingc=0.2 removes reads with <20% GC.
maxgc=1: Discard reads with GC content above this fraction (0-1). Example: maxgc=0.8 removes reads with >80% GC.
maxns=-1: Discard reads with more than this many N bases. Useful for removing low-quality sequences.
minavgquality=0: (maq) Discard reads with average quality below this. Can specify quality,bases like maq=20,50 for first 50 bases.
minbasequality=0: (mbq) Discard reads if any base has quality below this threshold.
minconsecutivebases=0: (mcb) Discard reads without at least this many consecutive non-N bases.

Basic Trimming

qtrim=f: Quality trimming mode. f=none, l=left end, r=right end, rl=both ends, w=sliding window. Uses Phred algorithm.
trimq=6: (trimquality) Quality threshold for trimming. Regions with average quality below this are removed.
forcetrimleft=0: (ftl) Force-trim this many bases from left end, regardless of quality. 0-based position.
forcetrimright=0: (ftr) Force-trim bases to right of this position (0-based, exclusive).
forcetrimright2=0: (ftr2) Force-trim this many bases from right end.
forcetrimmod=0: (ftm) Trim right end to make length divisible by this number. ftm=5 converts 151bp reads to 150bp.
minlengthfraction=0: (mlf) Discard reads trimmed to less than this fraction of original length.

Format & Conversion

touppercase=f: (tuc) Convert all bases to uppercase. Useful for standardizing mixed-case input.
lowercaseton=f: (lctn) Convert lowercase bases to N. Some tools treat lowercase as low-confidence.
utot=f: Convert U bases (RNA) to T bases (DNA). Useful for processing RNA sequences with DNA tools.
fastawrap=70: (wrap) Line length for FASTA output. 0 means no wrapping (single line per sequence).
trimreaddescription=f: (trd, trc) Trim read names at first whitespace. Simplifies read names for downstream tools.
undefinedton=f: (iupacton, itn) Convert IUPAC ambiguous bases (R, Y, etc.) to N.

Histogram Generation

bhist=<file>: Base composition histogram by position. Shows A/T/G/C content across read positions.
qhist=<file>: Quality histogram by position. Shows quality score distribution across read positions.
qchist=<file>: (qdhist, qfhist) Quality count histogram. Count of bases with each quality value.
aqhist=<file>: Average quality histogram. Distribution of average quality scores per read.
bqhist=<file>: Base quality histogram designed for box plots.
lhist=<file>: Read length histogram. Distribution of read lengths after processing.
gchist=<file>: GC content histogram. Distribution of per-read GC percentages.
gcbins=100: Number of bins for GC histogram. Set to 'auto' to use read length as number of bins.
enthist=<file>: (entropyhist, enhist) Entropy histogram. Sequence complexity distribution.

Advanced Parameters

verbose=f: Print detailed status messages for debugging. Shows progress information and internal details.
silent=f: Suppress most output messages. Useful in automated pipelines where output should be minimal.
config=<file>: Load parameters from config file, one parameter per line. Useful for complex parameter sets.
amino=f: Process amino acid sequences instead of nucleotides. Limited support in most tools.
parsecustom=f: (fastqparsecustom) Parse custom data from synthetic read names created by RandomReads.
testsize=f: Verify file sizes are correct. Useful for debugging file corruption issues.
bf1=auto: Force ByteFile1 mode for reading files. Usually auto-detection works fine.
bf2=auto: Force ByteFile2 mode for reading files (faster). Usually auto-detection works fine.

Usage Examples

Memory and Threading

# Limit memory usage and threads for shared systems
reformat.sh in=reads.fq out=reformatted.fq -Xmx8g t=4

# Enable monitoring for batch jobs
bbduk.sh in=reads.fq out=clean.fq monitor=3600,0.01 ref=adapters

Memory and thread control is essential on shared computing resources to be considerate of other users.

Input/Output Control

# Process subset with reproducible sampling
reformat.sh in=large.fq out=subset.fq samplerate=0.1 sampleseed=42 reads=1000000

# Force interleaved mode when piping
cat reads.fq.gz | bbduk.sh in=stdin.fq.gz out=stdout.fq int=t ref=adapters | gzip > clean.fq.gz

Sampling and streaming capabilities enable efficient processing of large datasets.

Quality Control

# Basic quality filtering with trimming
bbduk.sh in=reads.fq out=clean.fq qtrim=rl trimq=10 minlen=50 maq=20

# Comprehensive quality control
bbduk.sh in=reads.fq out=clean.fq qtrim=r trimq=8 minlen=30 maxns=2 mingc=0.1 maxgc=0.9

Quality filtering removes poor-quality sequences that could impact downstream analysis.

Format Conversion

# Convert FASTA to FASTQ with quality values
reformat.sh in=sequences.fa out=sequences.fq qfake=30

# Standardize read names and case
reformat.sh in=reads.fq out=clean.fq tuc=t trd=t fastawrap=0

Format conversion is often needed when integrating data from different sources.

Histogram Generation

# Generate comprehensive quality metrics
reformat.sh in=reads.fq out=clean.fq \
    bhist=base_comp.txt qhist=qual_pos.txt aqhist=avg_qual.txt \
    lhist=lengths.txt gchist=gc_content.txt

# Quality control with histograms
bbduk.sh in=reads.fq out=clean.fq ref=adapters \
    qhist=quality_before.txt lhist=lengths_after.txt

Histograms provide essential quality metrics for assessing data characteristics.

Config File Usage

# Create config file: preprocessing.conf
# qtrim=rl
# trimq=10
# minlen=50
# maq=20
# ref=adapters
# overwrite=t

# Use config file
bbduk.sh in=reads.fq out=clean.fq config=preprocessing.conf

Config files simplify complex parameter sets and enable reproducible processing.

Important Notes

Parameter Compatibility

Not all parameters work with all tools: Tool-specific parameters may override or conflict with universal parameters
Some tools ignore certain parameters: For example, Tadpole doesn't use many I/O parameters
Check tool documentation: Individual tool pages list which parameters are supported
Boolean precedence: Later parameter values override earlier ones in the same command

Memory Guidelines

Small files: 1-4GB usually sufficient (-Xmx4g)
Large references: Use ~85% of physical memory (-Xmx27g on 32GB machine)
Error handling: Use -eoom for clean exits on memory exhaustion
Shared systems: Be conservative with memory requests

Threading Best Practices

Dedicated systems: Default auto-detection usually optimal
Shared systems: Limit threads based on allocation (t=4)
I/O bound tasks: More threads may not improve performance
Memory vs threads: Sometimes better to use fewer threads with more memory per thread

Support

For questions about universal parameters:

Email: bbushnell@lbl.gov
Documentation: bbmap.org
Usage Guide: Read bbtools/docs/UsageGuide.txt for comprehensive parameter documentation
Tool-specific help: Run any tool without parameters to see its specific options