RemoveSmartBell

Script: removesmartbell.sh Package: pacbio Class: RemoveAdapters2.java

Remove Smart Bell adapters from PacBio reads using MultiStateAligner algorithms with locality-aware adapter detection and optional backup aligner for increased sensitivity.

Basic Usage

removesmartbell.sh in=<input> out=<output> split=t

Input may be fasta or fastq, compressed or uncompressed (not H5 files).

Parameters

Parameters control adapter detection, processing modes, and output formatting.

Core Parameters

in=file
Specify the input file, or stdin. Can use # notation for paired files (e.g., reads#.fq becomes reads1.fq and reads2.fq).
in2=file
Specify the second input file for paired data.
out=file
Specify the output file, or stdout. Can use # notation for paired output files.
adapter=string
Specify the adapter sequence. Default is normal SmrtBell adapter sequence (ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT).
split=t
Processing mode: t=splits reads at adapters into separate contigs, f=masks adapters with X symbols but keeps reads intact. Default: true

Alignment Parameters

minratio=0.31
Minimum alignment score ratio for adapter detection. At 250bp reads, approximately 0.01% false-positive and 94% true-positive rate.
suspectratio=0.85
Ratio threshold for suspect alignments that may be confirmed by nearby adapters or secondary alignment methods.
usealtmsa=t
Enable alternate multi-state alignment algorithm for improved sensitivity. Uses MultiStateAligner9PacBioAdapter2 as backup method.
plusonly=f
Only search for adapters in forward orientation. When true, disables reverse complement search.
minusonly=f
Only search for adapters in reverse complement orientation. When true, disables forward search.

Quality Control Parameters

mincontig=50
Minimum contig length to retain after splitting. Shorter sequences are discarded.
reads=unlimited
Maximum number of reads to process. Supports K/M/G suffixes (e.g., reads=1M).
maxreads=unlimited
Alias for reads parameter.

Processing Parameters

threads=auto
Number of processing threads. Use 'auto' to detect available processors automatically.
overwrite=t
Allow overwriting of existing output files.
append=f
Append to existing output files instead of overwriting.
verbose=f
Print additional processing information and statistics.

Common Parser Parameters

path=
Set the path for temporary files and working directory.
tempdir=
Specify temporary directory for intermediate files.
root=
Root directory for relative file paths.

Examples

Basic Adapter Removal

removesmartbell.sh in=pacbio_reads.fq out=clean_reads.fq split=t

Removes Smart Bell adapters from PacBio reads, splitting reads at adapter locations.

Mask Adapters Instead of Splitting

removesmartbell.sh in=pacbio_reads.fq out=masked_reads.fq split=f

Masks adapter sequences with X symbols but keeps reads intact as single sequences.

Custom Adapter Sequence

removesmartbell.sh in=reads.fq out=clean.fq adapter=CUSTOMADAPTERSEQUENCE

Uses a custom adapter sequence instead of the default Smart Bell adapter.

High Sensitivity Processing

removesmartbell.sh in=reads.fq out=clean.fq minratio=0.25 usealtmsa=t

Increases sensitivity by lowering alignment threshold and enabling alternate alignment algorithm.

Paired-End Processing

removesmartbell.sh in=reads#.fq out=clean#.fq

Processes paired-end files reads1.fq and reads2.fq, outputting to clean1.fq and clean2.fq.

Algorithm Details

Alignment Strategy

RemoveSmartBell uses MultiStateAligner algorithms specifically designed for PacBio adapter detection:

Scoring and Thresholds

The algorithm implements a multi-threshold scoring system based on Smith-Waterman alignment scores:

Processing Strategy

The algorithm processes reads using a sliding window approach:

Output Modes

Two primary processing modes are available:

Performance Characteristics

Implementation characteristics based on source code analysis:

Statistics and Output

RemoveSmartBell provides comprehensive statistics on adapter detection performance:

Processing Statistics

Accuracy Metrics

For synthetic data with known adapter positions:

Support

For questions and support: