process_lane_v007.sh

Script: process_lane_v007.sh Author: Brian Bushnell Version: 0.07 Last Updated: August 17, 2024

Comprehensive NovaSeq full-lane analysis pipeline that gathers quality metrics, performs PhiX-based recalibration, and generates barcode counts for downstream per-library processing. Designed as a non-destructive analysis tool that preserves original data while generating essential quality control files.

Purpose

This pipeline analyzes complete Illumina sequencing lanes to generate quality metrics and recalibration data without modifying the original sequencing files. It produces standardized output files (PHIX, TILEDUMP, QHIST, COUNTS) that integrate with the Jamo system for downstream per-library processing and quality assessment.

Prerequisites

Configuration Variables

The pipeline uses environment variables that must be configured before execution:

Lane Configuration

# Lane identification and file paths
LANEID=ABXYZ                    # Lane-specific identifier
RAW=ABXYZ.1.fq.gz              # Lane fastq filename
RAWPATH=/foo/bar/"$RAW"        # Full input path
OUT="$PSCRATCH"/"$LANEID"      # Output directory for large files

System Resources

# Hardware configuration
CORES=64                        # Physical CPU cores
ZL=9                           # Compression level (4 if bgzip unavailable)
MAXRAM=48g                     # 85% of physical RAM
HIGHRAM=31g                    # High memory operations
LOWRAM=4g                      # Low memory operations

Pipeline Stages

Stage 1: PhiX Isolation and Processing

# Filter PhiX reads from raw lane data
bbduk.sh "$LOW" "$ARGS" ref=phix k=25 hdist=2 in="$RAWPATH" outm=phix.fq.gz

# Adapter trim PhiX reads for accurate alignment
bbduk.sh "$LOW" "$ARGS" in=phix.fq.gz out=phix_trimmed.fq.gz ref=adapters k=23 mink=11 hdist=2 hdist2=0 tbo tpe ktrim=r minlen=100 ordered

Parameters:

Stage 2: PhiX Alignment and Quality Analysis

# Align PhiX reads with comprehensive quality metrics
bbmap.sh "$HIGH" "$ARGS" ref=phix nodisk vslow maxindel=100 in=phix_trimmed.fq.gz outm=phix.sam.gz qhist="$QHIST" qahist=qahist.txt mhist=mhist.txt bhist=bhist.txt ordered

Key Features:

Stage 3: Quality Recalibration Matrix Generation

# Calculate true quality scores using PhiX alignments
calctruequality.sh "$HIGH" "$ARGS" in="$PHIX" usetiles callvars ref=phix

Features:

Stage 4: Lane-wide Quality Recalibration

# Apply recalibration to entire lane
bbduk.sh "$LOW" "$ARGS" in="$RAWPATH" out="$RECAL" recalibrate usetiles

Applies calculated recalibration matrices to the complete lane data, correcting systematic quality score biases.

Stage 5: Tile Quality Assessment

# Analyze per-tile quality after recalibration
filterbytile.sh "$MAX" "$ARGS" in="$RECAL" dump="$TILEDUMP"

Identifies problematic tiles and generates comprehensive tile quality metrics. This step is most effective when performed on recalibrated data for the complete lane.

Stage 6: Barcode Quantification

# Count all barcodes in the lane
countbarcodes2.sh "$HIGH" "$ARGS" in="$RAWPATH" counts="$COUNTS"

Generates comprehensive barcode counts for downstream demultiplexing and library quantification.

Output Files

The pipeline generates four essential files for Jamo system integration:

Required Output Files

Additional Quality Files

Temporary Files

Usage Example

# Configure environment variables
export LANEID=NovaSeq_001_Lane1
export RAW=NS001_L1.fastq.gz
export RAWPATH=/data/raw/"$RAW"
export PSCRATCH=/scratch/analysis

# Run the pipeline
./process_lane_v007.sh

# Check completion
if [ -f finished ]; then
    echo "Pipeline completed successfully"
    ls -la phix.sam.gz tiledump.txt.gz qhist.txt barcodecounts.txt.gz
fi

Jamo System Integration

After pipeline completion, these files must be uploaded to Jamo and associated with the lane:

Performance Characteristics

Platform Compatibility

Quality Control Notes

Related Tools