Consect
Generates the conservative consensus of multiple error-correction tools. Corrections will be accepted only if all tools agree. This tool is designed for substitutions only, not indel corrections.
Basic Usage
consect.sh in=<file,file,file,...> out=<file>
Consect requires a minimum of 3 input files: the first file must contain the original uncorrected reads, followed by at least two files containing corrected reads from different error-correction tools. All files must have reads in the same order.
Parameters
Consect accepts standard BBTools parameters and Java memory options. Parameters are organized by their function in the consensus generation process.
Standard Parameters
- in=
- A comma-delimited list of files; minimum of 3. All files must have reads in the same order. The first file must contain the uncorrected reads. All additional files must contain corrected reads.
- out=<file>
- Output of consensus reads.
- overwrite=f
- (ow) Set to false to force the program to abort rather than overwrite an existing file.
- cq=f
- (changequality) Set to true to update quality scores based on the consensus corrections. When enabled, quality scores are set to the maximum quality from the corrected reads at positions where consensus is achieved.
- verbose=f
- Set to true to print verbose messages during processing, useful for debugging and monitoring progress.
Java Parameters
- -Xmx
- This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Examples
Basic Consensus Generation
consect.sh in=original.fq,tadpole_corrected.fq,bless_corrected.fq out=consensus.fq
Generate consensus from original reads and two different error correction tools (Tadpole and BLESS). Only corrections where both tools agree will be applied.
Consensus with Quality Score Updates
consect.sh in=raw.fq,corrected1.fq,corrected2.fq,corrected3.fq out=consensus.fq cq=t
Generate consensus from three correction tools and update quality scores at corrected positions.
Verbose Mode for Monitoring
consect.sh in=original.fq,spades_ec.fq,karect.fq out=consensus.fq verbose=t
Run with verbose output to monitor processing progress and see detailed statistics.
Algorithm Details
Conservative Consensus Strategy
Consect implements a conservative consensus algorithm designed specifically for substitution corrections. The algorithm processes reads position by position, applying corrections only when all error-correction tools agree on the same base change.
Position-by-Position Analysis
For each position in a read, the algorithm:
- Counts base votes: Examines the base called by each correction tool at that position
- Requires unanimity: Only applies corrections when all tools agree on the replacement base
- Handles disagreements: When tools disagree, the original base is preserved
- Manages variable lengths: Handles cases where some reads are shorter than others
Quality Score Management
When the cq=t
parameter is enabled:
- Quality scores at corrected positions are set to the maximum quality from all contributing tools
- Original quality scores are preserved at uncorrected positions
- This provides confidence estimates for the consensus corrections
Performance Characteristics
- Memory usage: Scales linearly with read length and number of input files
- Processing speed: Limited by I/O since all files must be read synchronously
- Accuracy: High precision (few false positives) due to conservative approach, but lower recall (some true corrections may be missed)
Statistical Reporting
Consect provides comprehensive statistics including:
- Total errors corrected and disagreements found
- Number of reads with corrections vs. disagreements
- Classification of reads as fully corrected, partially corrected, uncorrected, or error-free
- Processing throughput (reads and bases per second)
Design Limitations
Important constraints of the consensus algorithm:
- Substitutions only: Designed for base substitution corrections, not insertions or deletions
- Read order dependency: All input files must contain reads in identical order
- Minimum input requirement: Requires at least 3 files (original + 2 corrected versions)
- Conservative bias: Favors precision over recall, potentially missing some valid corrections
Output Statistics
Consect provides detailed statistics about the consensus process:
Correction Metrics
- Errors Corrected: Total number of positions where unanimous corrections were applied
- Disagreements: Total number of positions where correction tools disagreed
Read-Level Statistics
- Reads With Corrections: Number of reads that received at least one correction
- Reads With Disagreements: Number of reads where tools disagreed on at least one position
- Reads Fully Corrected: Reads with corrections but no disagreements
- Reads Partly Corrected: Reads with both corrections and disagreements
- Reads Not Corrected: Reads with disagreements but no corrections applied
- Reads Error Free: Reads with no corrections or disagreements
Best Practices
Input File Preparation
- Ensure all input files contain reads in identical order
- Use the same file format (FASTQ recommended) for all inputs
- Verify that correction tools used similar parameters and approaches
Tool Selection
- Choose diverse error-correction algorithms for better consensus reliability
- Consider tools with different algorithmic approaches (k-mer based, overlap based, etc.)
- Avoid tools that perform aggressive trimming or filtering that could affect read order
Parameter Tuning
- Enable
cq=t
when quality scores are important for downstream analysis - Use
verbose=t
for initial runs to understand correction patterns - Monitor memory usage with appropriate
-Xmx
settings for large datasets
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org