PlotFlowcell
Generates statistics about flowcell positions. Analyzes a flow cell for low-quality areas and removes reads in the low-quality areas. This tool is entirely superseded by filterbytile and is scheduled to be removed after version 39.12.
Basic Usage
plotflowcell.sh in=<input> out=<output>
PlotFlowCell analyzes flowcell position data to identify low-quality tiles and regions based on multiple quality metrics including kmer frequencies, quality scores, and barcode accuracy.
Parameters
Parameters are organized by their function in the flowcell analysis process. All parameters from the shell script are documented here.
Input parameters
- in=<file>
- Primary input file.
- in2=<file>
- Second input file for paired reads in two files.
- indump=<file>
- Specify an already-made dump file to use instead of analyzing the input reads.
- reads=-1
- Process this number of reads, then quit (-1 means all).
- interleaved=auto
- Set true/false to override autodetection of the input file as paired interleaved.
Output parameters
- out=<file>
- Output file for filtered reads.
- dump=<file>
- Write a summary of quality information by coordinates.
Tile parameters
- xsize=500
- Initial width of micro-tiles.
- ysize=500
- Initial height of micro-tiles.
- size=
- Allows setting xsize and ysize to the same value.
- target=800
- Iteratively increase the size of micro-tiles until they contain an average of at least this number of reads.
Other parameters
- trimq=-1
- If set to a positive number, trim reads to that quality level instead of filtering them.
- qtrim=r
- If trimq is positive, perform quality trimming on this end of the reads. Values are r, l, and rl for right, left, and both ends.
Java Parameters
- -Xmx
- This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 GB of RAM; -Xmx200m will specify 200 MB. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Advanced Parameters
Additional parameters available through the Java implementation but not exposed in the shell script:
Kmer Analysis Parameters
- verbose=false
- Print verbose messages during processing.
- pound=true
- Enable pound character processing in headers.
- loadkmers=true
- Load kmers for quality analysis.
- allkmers=false
- Process all kmers in each read (kmersperread=0) vs. one kmer per read (kmersperread=1).
- kmersperread=1
- Number of kmers to sample per read for analysis.
- multithreaded=false
- Enable multithreaded loading and filling operations.
- multiload=false
- Enable multithreaded kmer loading.
- multifill=false
- Enable multithreaded tile filling.
Bloom Filter Parameters
- bloom=false
- Use Bloom filter for kmer counting instead of hash tables.
- bits=4
- Bits per cell in Bloom filter (cbits parameter).
- hashes=3
- Number of hash functions for Bloom filter.
Reference Alignment Parameters
- ref=phix
- Reference sequence for contamination detection (default PhiX).
- minid=0.65
- Minimum identity for reference alignment (65%).
- kalign=19
- Kmer length for reference alignment.
Barcode Analysis Parameters
- expectedbarcodes=null
- File containing expected barcodes for validation.
- shortheader=false
- Use short headers in output.
- longheader=true
- Use long headers in output (default).
Quality Control Parameters
- minpolyg=0
- Minimum poly-G tract length to track.
- trackcycles=false
- Track sequencing cycle information.
Examples
Basic Flowcell Analysis
plotflowcell.sh in=reads.fastq dump=flowcell_stats.txt
Analyze flowcell positions and write statistics to dump file.
Paired-end Analysis with Custom Tile Size
plotflowcell.sh in1=R1.fastq in2=R2.fastq size=1000 target=1000 dump=stats.txt
Process paired reads with 1000x1000 tile size, targeting 1000 reads per tile.
Using Pre-computed Dump File
plotflowcell.sh indump=previous_stats.txt target=500
Load existing statistics and adjust tile sizes to target 500 reads per tile.
Full Analysis with Bloom Filter
plotflowcell.sh in=reads.fastq bloom=t bits=6 hashes=4 multithreaded=t dump=analysis.txt
Use Bloom filter for kmer analysis with 6 bits per cell and 4 hash functions, with multithreading enabled.
Quality Trimming Mode
plotflowcell.sh in=reads.fastq trimq=20 qtrim=rl out=trimmed.fastq
Trim reads to quality 20 from both ends instead of just analyzing positions.
Algorithm Details
Flowcell Analysis Strategy
PlotFlowCell implements a multi-metric analysis framework using HashArray1D with 31-way partitioning and optional BloomFilter integration to analyze sequencing flowcells and identify problematic regions:
Micro-tile Organization
The flowcell is divided into micro-tiles with configurable dimensions (xsize × ysize, default 500×500). The tool uses an adaptive sizing strategy:
- Starts with initial tile dimensions
- Iteratively increases tile size until each contains at least 'target' reads (default 800)
- This ensures statistical power for quality assessment
Multi-threaded Processing Architecture
The implementation uses a parallel processing design with WorkerThread instances:
- Kmer Loading: Uses either HashArray1D with 31-way partitioning (keySets[WAYS=31]) or BloomFilter with configurable cbits (4) and hashes (3)
- Tile Filling: Multi-threaded analysis via spawnThreads() method with per-thread FlowCell objects
- Accumulation: Thread-safe merging using synchronized accumulate() method and ReadWriteLock
Quality Metrics Collection
For each micro-tile, the algorithm computes quality statistics through MicroTile.add() methods:
Kmer-based Quality Assessment
- 31-mer Analysis: Uses k=31, k2=30 with canonical kmer selection via AminoAcid.reverseComplementBinaryFast()
- Hit/Miss Ratios: Tracks kmers above frequency cutoffs (cutoff=2 for all kmers, cutoff=1 for sampled)
- Depth Accumulation: Sums kmer depths through mt.depthSum+=value in processTileKmers()
Bloom Filter Mode
Alternative analysis using space-efficient BloomFilter class:
- Configurable cbits per cell (default 4) and hash functions (default 3)
- BloomFilterCorrector.fillKmers() and fillCounts() for error detection
- Estimates unique kmers from filter occupancy via estimateUniqueKmersFromUsedFraction()
Reference Contamination Detection
Optional PhiX contamination analysis through MicroAligner3:
- MicroIndex3 with configurable alignK (default 19) for reference indexing
- Identity threshold minIdentity (default 0.65f) for alignment scoring
- Per-read alignment via mapper.map() calls
Barcode Quality Assessment
When barcodeStats is configured:
- Hamming distance calculation via barcodeStats.calcHdist() against expected barcodes
- Per-tile accumulation: mt.barcodes+=barcodesPerRead and mt.barcodeHDistSum+=hdist
- Support for dual-indexing through barcodesPerRead=2
Statistical Analysis
The flowcell statistics calculation through FlowCell.calcStats():
- Adaptive Binning: widenToTargetReads() method adjusts tile dimensions to meet targetAverageReads
- Quality Scoring: Multi-factor scoring combining kmer frequencies, quality scores, and alignment metrics
- Spatial Analysis: Position-based quality trends using FlowCell.getMicroTile() coordinate mapping
Memory Management
Memory usage through ScheduleMaker and partitioned data structures:
- Partitioned Hash Tables: HashArray1D with 31-way partitioning (WAYS=31) reduces memory fragmentation
- Streaming Processing: ConcurrentReadInputStream.getReadInputStream() for memory-conscious read processing
- Configurable Memory: calcXmx() function with automatic detection and manual override support
Output Formats
The dump file contains MicroTile statistics via FlowCell.dump() method:
- Spatial coordinates and read counts from MicroTile fields
- Quality score distributions tracked per tile
- Kmer frequency analysis results from hit/miss ratios
- Barcode accuracy metrics via barcodeHDistSum calculations
- Reference contamination levels from MicroAligner3 results
Performance Characteristics
PlotFlowCell processing characteristics based on implementation details:
- Scalability: ThreadWaiter.startAndWait() manages concurrent WorkerThread instances scaling with Shared.threads()
- Memory Options: BloomFilter mode reduces memory via probabilistic counting vs HashArray1D exact counting
- I/O Optimization: ConcurrentReadInputStream with ByteFile.FORCE_MODE_BF2 for parallel file access
- Reusability: FlowCell constructor supports loading from existing dump files for rapid re-analysis
Deprecation Notice
Important: PlotFlowCell has been superseded by the filterbytile tool, which provides improved algorithms and additional features for flowcell quality analysis. PlotFlowCell is scheduled for removal after version 39.12. Users should migrate to filterbytile for new analyses.
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org