Consect

Basic Usage

consect.sh in=<file,file,file,...> out=<file>

Consect requires a minimum of 3 input files: the first file must contain the original uncorrected reads, followed by at least two files containing corrected reads from different error-correction tools. All files must have reads in the same order.

Parameters

Consect accepts standard BBTools parameters and Java memory options. Parameters are organized by their function in the consensus generation process.

Standard Parameters

in=: A comma-delimited list of files; minimum of 3. All files must have reads in the same order. The first file must contain the uncorrected reads. All additional files must contain corrected reads.
out=<file>: Output of consensus reads.
overwrite=f: (ow) Set to false to force the program to abort rather than overwrite an existing file.
cq=f: (changequality) Set to true to update quality scores based on the consensus corrections. When enabled, quality scores are set to the maximum quality from the corrected reads at positions where consensus is achieved.
verbose=f: Set to true to print verbose messages during processing, useful for debugging and monitoring progress.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Examples

Basic Consensus Generation

consect.sh in=original.fq,tadpole_corrected.fq,bless_corrected.fq out=consensus.fq

Generate consensus from original reads and two different error correction tools (Tadpole and BLESS). Only corrections where both tools agree will be applied.

Consensus with Quality Score Updates

consect.sh in=raw.fq,corrected1.fq,corrected2.fq,corrected3.fq out=consensus.fq cq=t

Generate consensus from three correction tools and update quality scores at corrected positions.

Verbose Mode for Monitoring

consect.sh in=original.fq,spades_ec.fq,karect.fq out=consensus.fq verbose=t

Run with verbose output to monitor processing progress and see detailed statistics.

Algorithm Details

Conservative Consensus Strategy

Consect implements a conservative consensus algorithm designed specifically for substitution corrections. The algorithm processes reads position by position, applying corrections only when all error-correction tools agree on the same base change.

Position-by-Position Analysis

For each position in a read, the algorithm:

Counts base votes: Examines the base called by each correction tool at that position
Requires unanimity: Only applies corrections when all tools agree on the replacement base
Handles disagreements: When tools disagree, the original base is preserved
Manages variable lengths: Handles cases where some reads are shorter than others

Quality Score Management

When the cq=t parameter is enabled:

Quality scores at corrected positions are set to the maximum quality from all contributing tools
Original quality scores are preserved at uncorrected positions
This provides confidence estimates for the consensus corrections

Performance Characteristics

Memory usage: Scales linearly with read length and number of input files
Processing speed: Limited by I/O since all files must be read synchronously
Accuracy: High precision (few false positives) due to conservative approach, but lower recall (some true corrections may be missed)

Statistical Reporting

Consect provides comprehensive statistics including:

Total errors corrected and disagreements found
Number of reads with corrections vs. disagreements
Classification of reads as fully corrected, partially corrected, uncorrected, or error-free
Processing throughput (reads and bases per second)

Design Limitations

Important constraints of the consensus algorithm:

Substitutions only: Designed for base substitution corrections, not insertions or deletions
Read order dependency: All input files must contain reads in identical order
Minimum input requirement: Requires at least 3 files (original + 2 corrected versions)
Conservative bias: Favors precision over recall, potentially missing some valid corrections

Output Statistics

Consect provides detailed statistics about the consensus process:

Correction Metrics

Errors Corrected: Total number of positions where unanimous corrections were applied
Disagreements: Total number of positions where correction tools disagreed

Read-Level Statistics

Reads With Corrections: Number of reads that received at least one correction
Reads With Disagreements: Number of reads where tools disagreed on at least one position
Reads Fully Corrected: Reads with corrections but no disagreements
Reads Partly Corrected: Reads with both corrections and disagreements
Reads Not Corrected: Reads with disagreements but no corrections applied
Reads Error Free: Reads with no corrections or disagreements

Best Practices

Input File Preparation

Ensure all input files contain reads in identical order
Use the same file format (FASTQ recommended) for all inputs
Verify that correction tools used similar parameters and approaches

Tool Selection

Choose diverse error-correction algorithms for better consensus reliability
Consider tools with different algorithmic approaches (k-mer based, overlap based, etc.)
Avoid tools that perform aggressive trimming or filtering that could affect read order

Parameter Tuning

Enable cq=t when quality scores are important for downstream analysis
Use verbose=t for initial runs to understand correction patterns
Monitor memory usage with appropriate -Xmx settings for large datasets

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org