FilterQC

Script: filterqc.sh Author: Shijie Yao Pipeline: Python-based RQC Filter

Fastq Filter pipeline implementing Rolling Quality Control (RQC) filtering for sequencing data. This wrapper script provides a simplified interface to the RQC filtering pipeline that includes adapter trimming, contaminant removal, and quality control operations.

Basic Usage

filterqc.sh in=<file> out=<dir>

FilterQC is a simplified wrapper for the Rolling Quality Control (RQC) filtering pipeline. It processes a single FASTQ or FASTQ.gz file through a comprehensive quality control workflow and outputs filtered results to a specified directory.

Parameters

FilterQC accepts a minimal set of parameters to configure the filtering pipeline. The underlying Python pipeline implements the full RQC filtering workflow with automatic parameter selection.

Input/Output Parameters

in=file
Specify the input FASTQ or FASTQ.gz file to be processed. This is the raw sequencing data that will undergo quality control filtering. The file can be compressed with gzip compression (.gz extension).
out=dir
The output directory where filtered results and quality control reports will be written. The directory will be created if it does not exist. All output files from the filtering pipeline will be placed in this directory.
rqcfilterdata=dir
Path to the RQCFilterData directory containing reference databases, adapter sequences, and other resources required by the filtering pipeline. This directory contains contamination databases, adapter sequences, and quality control references used throughout the filtering process.

Optional Parameters

qc
Enable quality control reporting on the filtered output. When specified, additional QC statistics and reports will be generated for the filtered data, providing comprehensive quality metrics for downstream analysis evaluation.

Examples

Basic Filtering

filterqc.sh in=raw_reads.fastq.gz out=filtered_output

Processes a compressed FASTQ file through the RQC filtering pipeline, outputting filtered results to the 'filtered_output' directory.

Filtering with Custom RQC Data Path

filterqc.sh in=sample.fastq out=results rqcfilterdata=/path/to/RQCFilterData

Runs filtering with a custom path to the RQCFilterData directory containing the necessary reference databases and resources.

Filtering with Quality Control Reporting

filterqc.sh in=illumina_reads.fastq.gz out=qc_filtered rqcfilterdata=/data/RQCFilterData qc

Performs comprehensive filtering with additional quality control reporting enabled, generating detailed statistics and metrics for the filtered output data.

Pipeline Integration

# Process multiple samples
for sample in *.fastq.gz; do
    base=$(basename "$sample" .fastq.gz)
    filterqc.sh in="$sample" out="filtered_${base}" qc
done

Batch processing multiple FASTQ files through the FilterQC pipeline with quality control reporting.

Algorithm Details

RQC Filtering Pipeline

FilterQC implements a simplified interface to the Rolling Quality Control (RQC) filtering system, which coordinates multiple BBTools programs for sequencing data quality control. The underlying system performs multiple filtering operations in sequence:

Core Filtering Operations

Python Pipeline Architecture

The FilterQC script calls a Python-based pipeline (filter.py) located in the pytools directory. This pipeline coordinates multiple BBTools programs to achieve comprehensive filtering:

Quality Control Integration

When the 'qc' parameter is specified, the pipeline generates comprehensive quality control reports including:

Output Organization

The pipeline produces organized output in the specified directory:

Performance Characteristics

FilterQC processes sequencing data using several implementation characteristics:

RQCFilterData Dependencies

The filtering pipeline requires access to the RQCFilterData directory containing:

Integration with Larger Workflows

FilterQC integrates with larger genomic analysis workflows through:

Output Files

FilterQC generates multiple output files in the specified output directory:

Primary Output

Quality Control Reports (when qc enabled)

Intermediate Files

Best Practices

Input Preparation

Resource Configuration

Quality Control

Pipeline Integration

Troubleshooting

Common Issues

File Not Found Error
Ensure the input FASTQ file exists and is accessible. Check file permissions and path specifications.
RQCFilterData Path Error
Verify the RQCFilterData directory exists and contains the required reference databases and adapter libraries.
Insufficient Disk Space
Ensure adequate disk space is available for both temporary processing files and final output.
Python Pipeline Errors
Check that Python is properly installed and the pytools directory is accessible with the filter.py script.

Performance Optimization

Related Tools

FilterQC works in conjunction with other BBTools programs:

Contact and Support

For questions and support regarding FilterQC: