RemoveHuman

Script: removehuman.sh Package: align2 Class: BBMap.java

Removes reads that map to the human genome with at least 95% identity after quality trimming. Uses hard-coded paths for NERSC systems. Removes approximately 98.6% of human 2x150bp reads with zero false-positives to non-animals. Requires at least 16GB RAM.

Basic Usage

removehuman.sh in=<input file> outu=<clean output file>

Input may be fasta or fastq, compressed or uncompressed.

Important: This program uses hard-coded paths and will only run on NERSC systems unless you change the path parameter. The default path is /global/cfs/cdirs/bbtools/hg19.

Parameters

Parameters control the human genome removal process, including threading, file handling, quality trimming, and mapping behavior.

Basic Parameters

threads=auto
(t) Set number of threads to use; default is number of logical processors.
overwrite=t
(ow) Set to false to force the program to abort rather than overwrite an existing file.
interleaved=auto
(int) If true, forces fastq input to be paired and interleaved.

Quality Trimming Parameters

trim=t
Trim read ends to remove bases with quality below minq. Values: t (trim both ends), f (neither end), r (right end only), l (left end only).
untrim=t
Undo the trimming after mapping. This allows accurate mapping while preserving original read lengths in output.
minq=4
Trim quality threshold. Bases with quality scores below this value will be trimmed during the mapping process.

Output Parameters

ziplevel=2
(zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.
outm=<file>
File to output the reads that mapped to human genome. These are the contaminating reads that were removed.

Reference Path Parameter

path=
Set the path to an indexed human genome. Default is /global/cfs/cdirs/bbtools/hg19 for NERSC systems. Must be changed for use on other systems.

Additional BBMap Parameters

All BBMap parameters can be used with removehuman.sh. The tool internally uses BBMap with the following optimized settings:

minratio=0.9
Minimum alignment score ratio (set internally)
maxindel=3
Maximum indel length (set internally)
bwr=0.16
Bandwidth ratio for alignment (set internally)
bw=12
Bandwidth for alignment (set internally)
minhits=2
Minimum seed hits required (set internally)
k=14
Kmer length for mapping (set internally)
kfilter=25
Kmer filter threshold (set internally)
maxsites=1
Maximum mapping sites per read (set internally)

These internal settings are optimized for human contamination removal but can be overridden by specifying them explicitly.

Examples

Basic Human Removal

removehuman.sh in=sample.fastq.gz outu=clean_sample.fastq.gz

Removes human contamination from compressed FASTQ reads, outputting clean reads to clean_sample.fastq.gz

Paired-end Reads with Contamination Output

removehuman.sh in=reads_R1.fq in2=reads_R2.fq outu=clean_R1.fq outu2=clean_R2.fq outm=human_contam.fq

Process paired-end reads, output clean reads separately and save human contamination to a separate file

Custom Path and Threading

removehuman.sh in=data.fq outu=clean_data.fq path=/custom/path/to/human/index threads=8

Use a custom human genome index path and specify 8 threads for processing

Adjust Quality Trimming

removehuman.sh in=reads.fq outu=clean.fq minq=10 trim=r untrim=f

Trim low-quality bases (Q<10) from the right end only and keep the trimmed length in output

Algorithm Details

RemoveHuman is a wrapper around BBMap configured for identifying and removing human genomic contamination from sequencing data. The implementation uses specific parameter combinations:

Mapping Strategy

The tool uses BBMap's align2.BBMap engine with predefined parameters for contamination detection:

Quality Control Integration

The tool integrates quality trimming directly into the mapping process:

Performance Characteristics

RemoveHuman implementation characteristics:

Accuracy Metrics

Based on validation with human 2x150bp reads:

Technical Implementation

The tool uses BBMap's align2.BBMap mapping engine with these components:

System Requirements

Notes and Limitations

Support

For questions and support: