BBWrap

Script: bbwrap.sh Package: align2 Class: BBWrap.java

Simple wrapper that allows BBMap to be run multiple times without reloading the index each time. Particularly useful for saving compute resources when processing multiple datasets against large references, and for handling mixed paired and unpaired reads in the same workflow.

When to Use BBWrap

BBWrap is designed for specific scenarios where standard BBMap is inefficient or cannot handle the input:

Primary Use Cases

  • Multiple datasets with large references: Save time by avoiding repeated index loading for each dataset
  • Mixed paired and unpaired reads: BBMap cannot process both paired and unpaired reads in the same run, except by using BBWrap
  • Batch processing workflows: Process many files systematically while maintaining consistent reference indexing
  • Resource-constrained environments: Particularly beneficial with small number of reads and large reference genomes

Important Limitation: BBWrap will not work with stdin/stdout or histogram output. Use standard BBMap for those cases.

Basic Usage

bbwrap.sh ref=<reference fasta> in=<file,file,...> out=<file,file,...>

Usage Patterns

To index only:

bbwrap.sh ref=<reference fasta>

To map to an existing index:

bbwrap.sh in=<file,file,...> out=<file,file,...>

To map pairs and singletons to the same output file:

bbwrap.sh in1=read1.fq,singleton.fq in2=read2.fq,null out=mapped.sam append

Input Parameters

in=<file,file>
Input sequences to map. Accepts comma-separated list of files.
in1=<file,file>
Input sequences for read 1 (paired-end). Accepts comma-separated list of files.
in2=<file,file>
Input sequences for read 2 (paired-end). Use "null" as placeholder for unpaired datasets.
inlist=<fofn>
File containing list of input files, one per line. Alternative to comma-separated lists.
in1list=<fofn>
File containing list of read 1 input files, one per line.
in2list=<fofn>
File containing list of read 2 input files, one per line.

Output Parameters

Primary Output

out=<file,file>
Primary output files. Accepts comma-separated list of files.
out1=<file,file>
Output file for read 1 (paired-end). Accepts comma-separated list of files.
out2=<file,file>
Output file for read 2 (paired-end). Accepts comma-separated list of files.
outlist=<fofn>
File containing list of primary output files, one per line.
out1list=<fofn>
File containing list of read 1 output files, one per line.
out2list=<fofn>
File containing list of read 2 output files, one per line.

Mapped/Unmapped Streams

outm=<file,file>
Output file for mapped reads. Accepts comma-separated list of files.
outm1=<file,file>
Output file for mapped read 1 (paired-end).
outm2=<file,file>
Output file for mapped read 2 (paired-end).
outu=<file,file>
Output file for unmapped reads. Accepts comma-separated list of files.
outu1=<file,file>
Output file for unmapped read 1 (paired-end).
outu2=<file,file>
Output file for unmapped read 2 (paired-end).
outb=<file,file>
Output file for blacklisted/filtered reads.
outb1=<file,file>
Output file for blacklisted read 1 (paired-end).
outb2=<file,file>
Output file for blacklisted read 2 (paired-end).
outmlist=<fofn>
File containing list of mapped output files, one per line.
outm1list=<fofn>
File containing list of mapped read 1 output files, one per line.
outm2list=<fofn>
File containing list of mapped read 2 output files, one per line.
outulist=<fofn>
File containing list of unmapped output files, one per line.
outu1list=<fofn>
File containing list of unmapped read 1 output files, one per line.
outu2list=<fofn>
File containing list of unmapped read 2 output files, one per line.

Analysis Output

qualityhistogram=<file,file>
Output quality histogram files. Aliases: qualityhist, qhist.
matchhistogram=<file,file>
Output match histogram files. Aliases: matchhist, mhist.
inserthistogram=<file,file>
Output insert size histogram files. Aliases: inserthist, ihist.
bamscript=<file,file>
BAM script generation files. Aliases: bs.

Parameters

BBWrap accepts all standard BBMap parameters plus wrapper-specific options for managing multiple input/output files.

Control Parameters

ref=<file>
Reference fasta file. Only specify for the first run when creating the index. Aliases: reference, fasta.
mapper=bbmap
Select mapping algorithm. Options: bbmap (default), bbmappacbio, bbmappacbioskimmer, bbmap5, bbmapacc, bbsplit.
append=f
Append to files rather than overwriting. When true and exactly one output file is specified, all output is written to that single file.
path=<dir>
Root directory for index storage. Aliases: root.

BBMap Parameters: All standard BBMap parameters can be used with BBWrap. See bbmap.sh documentation for complete parameter list.

Examples

Efficient Multi-File Processing

bbwrap.sh ref=large_genome.fasta \
    in=sample1.fq,sample2.fq,sample3.fq,sample4.fq \
    out=mapped1.sam,mapped2.sam,mapped3.sam,mapped4.sam

Process four datasets against a large reference. Index is loaded once and reused for all four mappings, saving significant time compared to running BBMap four times separately.

Mixed Paired and Unpaired Reads

bbwrap.sh ref=genome.fasta \
    in1=paired_R1.fq,unpaired.fq \
    in2=paired_R2.fq,null \
    out=all_mapped.sam \
    append

Map both paired-end and unpaired reads to the same reference, outputting all results to a single file. This workflow is impossible with standard BBMap.

Batch Processing with File Lists

# Create input file list
echo -e "dataset1.fq\ndataset2.fq\ndataset3.fq" > input_files.txt
echo -e "mapped1.sam\nmapped2.sam\nmapped3.sam" > output_files.txt

bbwrap.sh ref=reference.fasta inlist=input_files.txt outlist=output_files.txt

Process multiple files using file lists, useful for automated pipelines with many datasets.

Separate Mapped and Unmapped Outputs

bbwrap.sh ref=host_genome.fasta \
    in=sample1.fq,sample2.fq,sample3.fq \
    outm=host_reads1.fq,host_reads2.fq,host_reads3.fq \
    outu=nonhost_reads1.fq,nonhost_reads2.fq,nonhost_reads3.fq

Separate host and non-host reads from multiple samples efficiently, useful for contamination removal workflows.

PacBio Long Read Processing

bbwrap.sh ref=reference.fasta \
    in=pacbio_run1.fq,pacbio_run2.fq \
    out=mapped_run1.sam,mapped_run2.sam \
    mapper=bbmappacbio

Process multiple PacBio datasets using the specialized long-read mapper, sharing the index across runs.

Quality Control Workflow

bbwrap.sh ref=reference.fasta \
    in=sample1.fq,sample2.fq \
    out=mapped1.sam,mapped2.sam \
    qhist=quality1.txt,quality2.txt \
    ihist=insert1.txt,insert2.txt \
    mhist=match1.txt,match2.txt

Generate mapping results and quality control statistics for multiple samples in a single run.

Algorithm Details

Index Reuse Strategy

BBWrap's primary efficiency comes from index persistence. When processing multiple datasets:

This approach dramatically reduces processing time when the index loading time is significant relative to mapping time, particularly with large references and smaller read datasets.

File Coordination

BBWrap maintains parallel lists of input and output files, processing them in synchronized fashion:

Mapper Selection

BBWrap can delegate to different alignment algorithms based on the mapper parameter:

Memory Management

BBWrap processes datasets sequentially rather than simultaneously, maintaining constant memory usage regardless of the number of input files. The shared index remains in memory across all runs, but read data is processed one dataset at a time.

Performance Characteristics

Performance benefits are most pronounced when:

Limitations and Considerations