ReadQC

Script: readqc.sh Package: pytools Backend: readqc.py

Read QC pipeline for quality assessment of fastq files. Provides HTML-formatted reports with quality metrics and visualization.

Basic Usage

readqc.sh in=<file> out=<dir>

This tool generates quality control reports for sequencing data in fastq format. It processes both uncompressed and gzipped fastq files, producing HTML reports with quality metrics and visualizations.

Parameters

ReadQC has a simple parameter structure focused on input specification and output directory configuration. The tool automatically generates HTML reports and skips BLAST analysis for faster processing.

Input/Output Parameters

in=file
Specify the input fastq or fastq.gz file. The tool accepts both compressed (.gz) and uncompressed fastq files. This parameter is required.
out=dir
The output directory where quality control reports will be generated. The directory will be created if it doesn't exist. HTML reports and associated files will be placed in this directory. This parameter is required.

Examples

Basic Quality Control

readqc.sh in=sample.fastq out=qc_results

Performs quality control analysis on sample.fastq and generates HTML reports in the qc_results directory.

Compressed Input File

readqc.sh in=reads.fastq.gz out=quality_reports

Processes a gzipped fastq file and creates quality assessment reports in the quality_reports directory.

Multiple Sample Analysis

# Process multiple samples
for sample in *.fastq.gz; do
    base=$(basename $sample .fastq.gz)
    readqc.sh in=$sample out=qc_${base}
done

Example shell loop to process multiple fastq files, creating separate output directories for each sample.

Algorithm Details

Quality Assessment Pipeline

ReadQC implements a quality control pipeline for fastq files. The tool performs the following analyses:

Processing Strategy

The readqc.py backend processes fastq files using the following implementation approach:

Report Contents

The generated HTML reports typically include:

Output Files

ReadQC generates several output files in the specified output directory:

Performance Considerations

Memory Usage

ReadQC uses a streaming approach for memory management:

Processing Time

Processing time depends on several factors:

Technical Notes

File Format Requirements

Error Handling

Dependencies

ReadQC requires:

Author Information

Written by: Shijie Yao

Last Modified: March 22, 2018

Contact: For specific questions about ReadQC, contact Shijie Yao at syao@lbl.gov

Support

For questions and support: