PlotFlowcell

Script: plotflowcell.sh Package: hiseq Class: PlotFlowCell.java

Generates statistics about flowcell positions. Analyzes a flow cell for low-quality areas and removes reads in the low-quality areas. This tool is entirely superseded by filterbytile and is scheduled to be removed after version 39.12.

Basic Usage

plotflowcell.sh in=<input> out=<output>

PlotFlowCell analyzes flowcell position data to identify low-quality tiles and regions based on multiple quality metrics including kmer frequencies, quality scores, and barcode accuracy.

Parameters

Parameters are organized by their function in the flowcell analysis process. All parameters from the shell script are documented here.

Input parameters

in=<file>
Primary input file.
in2=<file>
Second input file for paired reads in two files.
indump=<file>
Specify an already-made dump file to use instead of analyzing the input reads.
reads=-1
Process this number of reads, then quit (-1 means all).
interleaved=auto
Set true/false to override autodetection of the input file as paired interleaved.

Output parameters

out=<file>
Output file for filtered reads.
dump=<file>
Write a summary of quality information by coordinates.

Tile parameters

xsize=500
Initial width of micro-tiles.
ysize=500
Initial height of micro-tiles.
size=
Allows setting xsize and ysize to the same value.
target=800
Iteratively increase the size of micro-tiles until they contain an average of at least this number of reads.

Other parameters

trimq=-1
If set to a positive number, trim reads to that quality level instead of filtering them.
qtrim=r
If trimq is positive, perform quality trimming on this end of the reads. Values are r, l, and rl for right, left, and both ends.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 GB of RAM; -Xmx200m will specify 200 MB. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Advanced Parameters

Additional parameters available through the Java implementation but not exposed in the shell script:

Kmer Analysis Parameters

verbose=false
Print verbose messages during processing.
pound=true
Enable pound character processing in headers.
loadkmers=true
Load kmers for quality analysis.
allkmers=false
Process all kmers in each read (kmersperread=0) vs. one kmer per read (kmersperread=1).
kmersperread=1
Number of kmers to sample per read for analysis.
multithreaded=false
Enable multithreaded loading and filling operations.
multiload=false
Enable multithreaded kmer loading.
multifill=false
Enable multithreaded tile filling.

Bloom Filter Parameters

bloom=false
Use Bloom filter for kmer counting instead of hash tables.
bits=4
Bits per cell in Bloom filter (cbits parameter).
hashes=3
Number of hash functions for Bloom filter.

Reference Alignment Parameters

ref=phix
Reference sequence for contamination detection (default PhiX).
minid=0.65
Minimum identity for reference alignment (65%).
kalign=19
Kmer length for reference alignment.

Barcode Analysis Parameters

expectedbarcodes=null
File containing expected barcodes for validation.
shortheader=false
Use short headers in output.
longheader=true
Use long headers in output (default).

Quality Control Parameters

minpolyg=0
Minimum poly-G tract length to track.
trackcycles=false
Track sequencing cycle information.

Examples

Basic Flowcell Analysis

plotflowcell.sh in=reads.fastq dump=flowcell_stats.txt

Analyze flowcell positions and write statistics to dump file.

Paired-end Analysis with Custom Tile Size

plotflowcell.sh in1=R1.fastq in2=R2.fastq size=1000 target=1000 dump=stats.txt

Process paired reads with 1000x1000 tile size, targeting 1000 reads per tile.

Using Pre-computed Dump File

plotflowcell.sh indump=previous_stats.txt target=500

Load existing statistics and adjust tile sizes to target 500 reads per tile.

Full Analysis with Bloom Filter

plotflowcell.sh in=reads.fastq bloom=t bits=6 hashes=4 multithreaded=t dump=analysis.txt

Use Bloom filter for kmer analysis with 6 bits per cell and 4 hash functions, with multithreading enabled.

Quality Trimming Mode

plotflowcell.sh in=reads.fastq trimq=20 qtrim=rl out=trimmed.fastq

Trim reads to quality 20 from both ends instead of just analyzing positions.

Algorithm Details

Flowcell Analysis Strategy

PlotFlowCell implements a multi-metric analysis framework using HashArray1D with 31-way partitioning and optional BloomFilter integration to analyze sequencing flowcells and identify problematic regions:

Micro-tile Organization

The flowcell is divided into micro-tiles with configurable dimensions (xsize × ysize, default 500×500). The tool uses an adaptive sizing strategy:

Multi-threaded Processing Architecture

The implementation uses a parallel processing design with WorkerThread instances:

Quality Metrics Collection

For each micro-tile, the algorithm computes quality statistics through MicroTile.add() methods:

Kmer-based Quality Assessment
Bloom Filter Mode

Alternative analysis using space-efficient BloomFilter class:

Reference Contamination Detection

Optional PhiX contamination analysis through MicroAligner3:

Barcode Quality Assessment

When barcodeStats is configured:

Statistical Analysis

The flowcell statistics calculation through FlowCell.calcStats():

Memory Management

Memory usage through ScheduleMaker and partitioned data structures:

Output Formats

The dump file contains MicroTile statistics via FlowCell.dump() method:

Performance Characteristics

PlotFlowCell processing characteristics based on implementation details:

Deprecation Notice

Important: PlotFlowCell has been superseded by the filterbytile tool, which provides improved algorithms and additional features for flowcell quality analysis. PlotFlowCell is scheduled for removal after version 39.12. Users should migrate to filterbytile for new analyses.

Support

For questions and support: