IceCreamGrader

Basic Usage

icecreamgrader.sh in=<input file>

IceCreamGrader analyzes reads generated by IceCreamMaker and determines which reads are "triangle reads" (ice cream reads) versus normal reads. It outputs statistics showing the percentage of good versus bad reads and bases.

Parameters

IceCreamGrader uses a minimal parameter set, focusing on input specification and standard BBTools functionality.

Standard parameters

in=<file>: Input file containing reads to grade. Must be reads generated by IceCreamMaker with custom tab-delimited headers containing metadata about passes, subreads, missing segments, adapters, and error rates. Supports standard file formats (FASTQ, FASTA, compressed files).
verbose=f: Print verbose output messages during processing. When enabled, shows detailed information about stream initialization and processing stages.
maxreads=-1: Maximum number of reads to process from the input file. Default -1 processes all reads. Useful for testing on large datasets or limiting analysis to a subset of reads.
overwrite=t: Allow overwriting of existing output files. When false, prevents accidental overwriting of existing files.
append=f: Append output to existing files rather than overwriting them. When true, adds new results to the end of existing output files.

Examples

Basic Triangle Read Analysis

icecreamgrader.sh in=icecream_reads.fq

Analyzes reads from IceCreamMaker output and reports statistics on triangle reads vs normal reads.

Verbose Processing

icecreamgrader.sh in=icecream_reads.fq verbose=t

Runs analysis with detailed output messages showing processing stages and stream initialization.

Limited Read Processing

icecreamgrader.sh in=large_dataset.fq maxreads=10000

Processes only the first 10,000 reads for quick analysis of large datasets.

Algorithm Details

Triangle Read Detection

IceCreamGrader identifies "ice cream" or triangle reads by parsing the custom tab-delimited headers created by IceCreamMaker. The detection algorithm examines the subreads field in the header:

Triangle reads (ice cream): Reads with subreads > 1 are classified as problematic triangle reads
Normal reads (good): Reads with subreads = 1 are classified as normal, high-quality reads

Header Format Analysis

The tool expects reads with headers containing tab-delimited metadata fields:

m1_2_3/zmw/start_stop	passes=X.XX	fullPasses=N	subreads=N	missing=N	adapters=N	errorRate=X.XXX

Key components analyzed:

ZMW ID: Zero-mode waveguide identifier
Movie coordinates: Start and stop positions in the movie
Passes: Number of passes through the template
Subreads: Critical field for triangle detection - values > 1 indicate triangle reads
Missing segments: Count of missing template segments
Adapters: Number of adapter sequences detected
Error rate: Estimated error rate for the read

Processing Strategy

The tool uses a single-threaded processing approach with ConcurrentReadInputStream for sequential read processing:

Memory allocation: Uses fixed 200MB heap space (-Xmx200m -Xms200m)
Stream processing: ConcurrentReadInputStream.nextList() processes reads sequentially using ListNum<Read> without storing entire dataset in memory
Header parsing: ReadBuilder.isIceCream() uses String.split("\t") to extract tab-delimited metadata fields
Statistics accumulation: Maintains running long counters (goodReads, goodBases, badReads, badBases) incremented per read

Output Statistics

IceCreamGrader provides these specific statistics on the analyzed dataset:

Processing summary: Total reads and bases processed with timing information
Good reads: Count and percentage of normal reads (subreads = 1)
Good bases: Total bases in good reads with percentage
Bad reads: Count and percentage of triangle reads (subreads > 1)
Bad bases: Total bases in triangle reads with percentage

Integration with IceCreamMaker

IceCreamGrader is designed as the quality assessment companion to IceCreamMaker:

Complementary workflow: IceCreamMaker generates synthetic reads, IceCreamGrader evaluates their quality
Header compatibility: Specifically parses the custom header format generated by IceCreamMaker
Quality metrics: Provides feedback on the proportion of problematic triangle reads in generated datasets
Performance validation: Enables assessment of IceCreamMaker parameter settings and their impact on read quality

Performance Characteristics

Memory Usage

Heap space: Fixed 200MB allocation (-Xmx200m -Xms200m)
Stream-based processing: ConcurrentReadInputStream prevents full dataset storage in memory
Scalability: Memory usage independent of input file size

Processing Speed

Single-threaded: Uses single processing thread for sequential header parsing
I/O bound: Performance primarily limited by disk read speed
String operations: Uses String.split() and Integer.parseInt() for header analysis

File Format Support

FASTQ/FASTA: Standard sequence formats with custom headers
Compression: Automatic handling of gzipped files
Custom headers: Requires IceCreamMaker-generated header format

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org