PlotFlowcell

Basic Usage

plotflowcell.sh in=<input> out=<output>

PlotFlowCell analyzes flowcell position data to identify low-quality tiles and regions based on multiple quality metrics including kmer frequencies, quality scores, and barcode accuracy.

Parameters

Parameters are organized by their function in the flowcell analysis process. All parameters from the shell script are documented here.

Input parameters

in=<file>: Primary input file.
in2=<file>: Second input file for paired reads in two files.
indump=<file>: Specify an already-made dump file to use instead of analyzing the input reads.
reads=-1: Process this number of reads, then quit (-1 means all).
interleaved=auto: Set true/false to override autodetection of the input file as paired interleaved.

Output parameters

out=<file>: Output file for filtered reads.
dump=<file>: Write a summary of quality information by coordinates.

Tile parameters

xsize=500: Initial width of micro-tiles.
ysize=500: Initial height of micro-tiles.
size=: Allows setting xsize and ysize to the same value.
target=800: Iteratively increase the size of micro-tiles until they contain an average of at least this number of reads.

Other parameters

trimq=-1: If set to a positive number, trim reads to that quality level instead of filtering them.
qtrim=r: If trimq is positive, perform quality trimming on this end of the reads. Values are r, l, and rl for right, left, and both ends.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 GB of RAM; -Xmx200m will specify 200 MB. The max is typically 85% of physical memory.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Advanced Parameters

Additional parameters available through the Java implementation but not exposed in the shell script:

Kmer Analysis Parameters

verbose=false: Print verbose messages during processing.
pound=true: Enable pound character processing in headers.
loadkmers=true: Load kmers for quality analysis.
allkmers=false: Process all kmers in each read (kmersperread=0) vs. one kmer per read (kmersperread=1).
kmersperread=1: Number of kmers to sample per read for analysis.
multithreaded=false: Enable multithreaded loading and filling operations.
multiload=false: Enable multithreaded kmer loading.
multifill=false: Enable multithreaded tile filling.

Bloom Filter Parameters

bloom=false: Use Bloom filter for kmer counting instead of hash tables.
bits=4: Bits per cell in Bloom filter (cbits parameter).
hashes=3: Number of hash functions for Bloom filter.

Reference Alignment Parameters

ref=phix: Reference sequence for contamination detection (default PhiX).
minid=0.65: Minimum identity for reference alignment (65%).
kalign=19: Kmer length for reference alignment.

Barcode Analysis Parameters

expectedbarcodes=null: File containing expected barcodes for validation.
shortheader=false: Use short headers in output.
longheader=true: Use long headers in output (default).

Quality Control Parameters

minpolyg=0: Minimum poly-G tract length to track.
trackcycles=false: Track sequencing cycle information.

Examples

Basic Flowcell Analysis

plotflowcell.sh in=reads.fastq dump=flowcell_stats.txt

Analyze flowcell positions and write statistics to dump file.

Paired-end Analysis with Custom Tile Size

plotflowcell.sh in1=R1.fastq in2=R2.fastq size=1000 target=1000 dump=stats.txt

Process paired reads with 1000x1000 tile size, targeting 1000 reads per tile.

Using Pre-computed Dump File

plotflowcell.sh indump=previous_stats.txt target=500

Load existing statistics and adjust tile sizes to target 500 reads per tile.

Full Analysis with Bloom Filter

plotflowcell.sh in=reads.fastq bloom=t bits=6 hashes=4 multithreaded=t dump=analysis.txt

Use Bloom filter for kmer analysis with 6 bits per cell and 4 hash functions, with multithreading enabled.

Quality Trimming Mode

plotflowcell.sh in=reads.fastq trimq=20 qtrim=rl out=trimmed.fastq

Trim reads to quality 20 from both ends instead of just analyzing positions.

Algorithm Details

Flowcell Analysis Strategy

PlotFlowCell implements a multi-metric analysis framework using HashArray1D with 31-way partitioning and optional BloomFilter integration to analyze sequencing flowcells and identify problematic regions:

Micro-tile Organization

The flowcell is divided into micro-tiles with configurable dimensions (xsize × ysize, default 500×500). The tool uses an adaptive sizing strategy:

Starts with initial tile dimensions
Iteratively increases tile size until each contains at least 'target' reads (default 800)
This ensures statistical power for quality assessment

Multi-threaded Processing Architecture

The implementation uses a parallel processing design with WorkerThread instances:

Kmer Loading: Uses either HashArray1D with 31-way partitioning (keySets[WAYS=31]) or BloomFilter with configurable cbits (4) and hashes (3)
Tile Filling: Multi-threaded analysis via spawnThreads() method with per-thread FlowCell objects
Accumulation: Thread-safe merging using synchronized accumulate() method and ReadWriteLock

Quality Metrics Collection

For each micro-tile, the algorithm computes quality statistics through MicroTile.add() methods:

Kmer-based Quality Assessment

31-mer Analysis: Uses k=31, k2=30 with canonical kmer selection via AminoAcid.reverseComplementBinaryFast()
Hit/Miss Ratios: Tracks kmers above frequency cutoffs (cutoff=2 for all kmers, cutoff=1 for sampled)
Depth Accumulation: Sums kmer depths through mt.depthSum+=value in processTileKmers()

Bloom Filter Mode

Alternative analysis using space-efficient BloomFilter class:

Configurable cbits per cell (default 4) and hash functions (default 3)
BloomFilterCorrector.fillKmers() and fillCounts() for error detection
Estimates unique kmers from filter occupancy via estimateUniqueKmersFromUsedFraction()

Reference Contamination Detection

Optional PhiX contamination analysis through MicroAligner3:

MicroIndex3 with configurable alignK (default 19) for reference indexing
Identity threshold minIdentity (default 0.65f) for alignment scoring
Per-read alignment via mapper.map() calls

Barcode Quality Assessment

When barcodeStats is configured:

Hamming distance calculation via barcodeStats.calcHdist() against expected barcodes
Per-tile accumulation: mt.barcodes+=barcodesPerRead and mt.barcodeHDistSum+=hdist
Support for dual-indexing through barcodesPerRead=2

Statistical Analysis

The flowcell statistics calculation through FlowCell.calcStats():

Adaptive Binning: widenToTargetReads() method adjusts tile dimensions to meet targetAverageReads
Quality Scoring: Multi-factor scoring combining kmer frequencies, quality scores, and alignment metrics
Spatial Analysis: Position-based quality trends using FlowCell.getMicroTile() coordinate mapping

Memory Management

Memory usage through ScheduleMaker and partitioned data structures:

Partitioned Hash Tables: HashArray1D with 31-way partitioning (WAYS=31) reduces memory fragmentation
Streaming Processing: ConcurrentReadInputStream.getReadInputStream() for memory-conscious read processing
Configurable Memory: calcXmx() function with automatic detection and manual override support

Output Formats

The dump file contains MicroTile statistics via FlowCell.dump() method:

Spatial coordinates and read counts from MicroTile fields
Quality score distributions tracked per tile
Kmer frequency analysis results from hit/miss ratios
Barcode accuracy metrics via barcodeHDistSum calculations
Reference contamination levels from MicroAligner3 results

Performance Characteristics

PlotFlowCell processing characteristics based on implementation details:

Scalability: ThreadWaiter.startAndWait() manages concurrent WorkerThread instances scaling with Shared.threads()
Memory Options: BloomFilter mode reduces memory via probabilistic counting vs HashArray1D exact counting
I/O Optimization: ConcurrentReadInputStream with ByteFile.FORCE_MODE_BF2 for parallel file access
Reusability: FlowCell constructor supports loading from existing dump files for rapid re-analysis

Deprecation Notice

Important: PlotFlowCell has been superseded by the filterbytile tool, which provides improved algorithms and additional features for flowcell quality analysis. PlotFlowCell is scheduled for removal after version 39.12. Users should migrate to filterbytile for new analyses.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org