BBWrap

When to Use BBWrap

BBWrap is designed for specific scenarios where standard BBMap is inefficient or cannot handle the input:

                Primary Use Cases
                Multiple datasets with large references: Save time by avoiding repeated index loading for each dataset
Mixed paired and unpaired reads: BBMap cannot process both paired and unpaired reads in the same run, except by using BBWrap
Batch processing workflows: Process many files systematically while maintaining consistent reference indexing
Resource-constrained environments: Particularly beneficial with small number of reads and large reference genomes

            

Important Limitation: BBWrap will not work with stdin/stdout or histogram output. Use standard BBMap for those cases.

Basic Usage

bbwrap.sh ref=<reference fasta> in=<file,file,...> out=<file,file,...>

Usage Patterns

To index only:

bbwrap.sh ref=<reference fasta>

To map to an existing index:

bbwrap.sh in=<file,file,...> out=<file,file,...>

To map pairs and singletons to the same output file:

bbwrap.sh in1=read1.fq,singleton.fq in2=read2.fq,null out=mapped.sam append

Input Parameters

in=<file,file>: Input sequences to map. Accepts comma-separated list of files.
in1=<file,file>: Input sequences for read 1 (paired-end). Accepts comma-separated list of files.
in2=<file,file>: Input sequences for read 2 (paired-end). Use "null" as placeholder for unpaired datasets.
inlist=<fofn>: File containing list of input files, one per line. Alternative to comma-separated lists.
in1list=<fofn>: File containing list of read 1 input files, one per line.
in2list=<fofn>: File containing list of read 2 input files, one per line.

Output Parameters

Primary Output

out=<file,file>: Primary output files. Accepts comma-separated list of files.
out1=<file,file>: Output file for read 1 (paired-end). Accepts comma-separated list of files.
out2=<file,file>: Output file for read 2 (paired-end). Accepts comma-separated list of files.
outlist=<fofn>: File containing list of primary output files, one per line.
out1list=<fofn>: File containing list of read 1 output files, one per line.
out2list=<fofn>: File containing list of read 2 output files, one per line.

Mapped/Unmapped Streams

outm=<file,file>: Output file for mapped reads. Accepts comma-separated list of files.
outm1=<file,file>: Output file for mapped read 1 (paired-end).
outm2=<file,file>: Output file for mapped read 2 (paired-end).
outu=<file,file>: Output file for unmapped reads. Accepts comma-separated list of files.
outu1=<file,file>: Output file for unmapped read 1 (paired-end).
outu2=<file,file>: Output file for unmapped read 2 (paired-end).
outb=<file,file>: Output file for blacklisted/filtered reads.
outb1=<file,file>: Output file for blacklisted read 1 (paired-end).
outb2=<file,file>: Output file for blacklisted read 2 (paired-end).
outmlist=<fofn>: File containing list of mapped output files, one per line.
outm1list=<fofn>: File containing list of mapped read 1 output files, one per line.
outm2list=<fofn>: File containing list of mapped read 2 output files, one per line.
outulist=<fofn>: File containing list of unmapped output files, one per line.
outu1list=<fofn>: File containing list of unmapped read 1 output files, one per line.
outu2list=<fofn>: File containing list of unmapped read 2 output files, one per line.

Analysis Output

qualityhistogram=<file,file>: Output quality histogram files. Aliases: qualityhist, qhist.
matchhistogram=<file,file>: Output match histogram files. Aliases: matchhist, mhist.
inserthistogram=<file,file>: Output insert size histogram files. Aliases: inserthist, ihist.
bamscript=<file,file>: BAM script generation files. Aliases: bs.

Parameters

BBWrap accepts all standard BBMap parameters plus wrapper-specific options for managing multiple input/output files.

Control Parameters

ref=<file>: Reference fasta file. Only specify for the first run when creating the index. Aliases: reference, fasta.
mapper=bbmap: Select mapping algorithm. Options: bbmap (default), bbmappacbio, bbmappacbioskimmer, bbmap5, bbmapacc, bbsplit.
append=f: Append to files rather than overwriting. When true and exactly one output file is specified, all output is written to that single file.
path=<dir>: Root directory for index storage. Aliases: root.

BBMap Parameters: All standard BBMap parameters can be used with BBWrap. See bbmap.sh documentation for complete parameter list.

Examples

Efficient Multi-File Processing

bbwrap.sh ref=large_genome.fasta \
    in=sample1.fq,sample2.fq,sample3.fq,sample4.fq \
    out=mapped1.sam,mapped2.sam,mapped3.sam,mapped4.sam

Process four datasets against a large reference. Index is loaded once and reused for all four mappings, saving significant time compared to running BBMap four times separately.

Mixed Paired and Unpaired Reads

bbwrap.sh ref=genome.fasta \
    in1=paired_R1.fq,unpaired.fq \
    in2=paired_R2.fq,null \
    out=all_mapped.sam \
    append

Map both paired-end and unpaired reads to the same reference, outputting all results to a single file. This workflow is impossible with standard BBMap.

Batch Processing with File Lists

# Create input file list
echo -e "dataset1.fq\ndataset2.fq\ndataset3.fq" > input_files.txt
echo -e "mapped1.sam\nmapped2.sam\nmapped3.sam" > output_files.txt

bbwrap.sh ref=reference.fasta inlist=input_files.txt outlist=output_files.txt

Process multiple files using file lists, useful for automated pipelines with many datasets.

Separate Mapped and Unmapped Outputs

bbwrap.sh ref=host_genome.fasta \
    in=sample1.fq,sample2.fq,sample3.fq \
    outm=host_reads1.fq,host_reads2.fq,host_reads3.fq \
    outu=nonhost_reads1.fq,nonhost_reads2.fq,nonhost_reads3.fq

Separate host and non-host reads from multiple samples efficiently, useful for contamination removal workflows.

PacBio Long Read Processing

bbwrap.sh ref=reference.fasta \
    in=pacbio_run1.fq,pacbio_run2.fq \
    out=mapped_run1.sam,mapped_run2.sam \
    mapper=bbmappacbio

Process multiple PacBio datasets using the specialized long-read mapper, sharing the index across runs.

Quality Control Workflow

bbwrap.sh ref=reference.fasta \
    in=sample1.fq,sample2.fq \
    out=mapped1.sam,mapped2.sam \
    qhist=quality1.txt,quality2.txt \
    ihist=insert1.txt,insert2.txt \
    mhist=match1.txt,match2.txt

Generate mapping results and quality control statistics for multiple samples in a single run.

Algorithm Details

Index Reuse Strategy

BBWrap's primary efficiency comes from index persistence. When processing multiple datasets:

First dataset: Loads reference and builds index (normal BBMap overhead)
Subsequent datasets: Reuses existing index in memory (near-zero index overhead)

This approach dramatically reduces processing time when the index loading time is significant relative to mapping time, particularly with large references and smaller read datasets.

File Coordination

BBWrap maintains parallel lists of input and output files, processing them in synchronized fashion:

Position-based matching: First input maps to first output, second to second, etc.
Append mode exception: When append=true and single output file specified, all inputs map to the same output
Null placeholders: Use "null" in file lists to handle mixed paired/unpaired datasets

Mapper Selection

BBWrap can delegate to different alignment algorithms based on the mapper parameter:

bbmap: Standard short-read aligner (default)
bbmappacbio: Optimized for PacBio long reads with high error rates
bbmappacbioskimmer: Fast approximate mapping for error correction workflows
bbmap5: Enhanced version with additional features
bbmapacc: High-accuracy variant for sensitive applications
bbsplit: Multi-reference mapping for contamination detection

Memory Management

BBWrap processes datasets sequentially rather than simultaneously, maintaining constant memory usage regardless of the number of input files. The shared index remains in memory across all runs, but read data is processed one dataset at a time.

Performance Characteristics

Performance benefits are most pronounced when:

Index size >> read data size: Large genomes with relatively small read datasets
Multiple similar datasets: Same reference, multiple samples
I/O-constrained environments: Systems where disk access is the bottleneck

Limitations and Considerations

No stdin/stdout support: Cannot use with pipes or stream processing
No histogram output support: Use standard BBMap for detailed statistical analysis
Sequential processing: Datasets are processed one at a time, not in parallel
Index persistence requirement: Index must fit in available memory for duration of all runs
Single reference limitation: All datasets must map to the same reference sequence