IceCreamGrader
Counts the rate of triangle reads in a file generated by IceCreamMaker with custom headers.
Basic Usage
icecreamgrader.sh in=<input file>
IceCreamGrader analyzes reads generated by IceCreamMaker and determines which reads are "triangle reads" (ice cream reads) versus normal reads. It outputs statistics showing the percentage of good versus bad reads and bases.
Parameters
IceCreamGrader uses a minimal parameter set, focusing on input specification and standard BBTools functionality.
Standard parameters
- in=<file>
- Input file containing reads to grade. Must be reads generated by IceCreamMaker with custom tab-delimited headers containing metadata about passes, subreads, missing segments, adapters, and error rates. Supports standard file formats (FASTQ, FASTA, compressed files).
- verbose=f
- Print verbose output messages during processing. When enabled, shows detailed information about stream initialization and processing stages.
- maxreads=-1
- Maximum number of reads to process from the input file. Default -1 processes all reads. Useful for testing on large datasets or limiting analysis to a subset of reads.
- overwrite=t
- Allow overwriting of existing output files. When false, prevents accidental overwriting of existing files.
- append=f
- Append output to existing files rather than overwriting them. When true, adds new results to the end of existing output files.
Examples
Basic Triangle Read Analysis
icecreamgrader.sh in=icecream_reads.fq
Analyzes reads from IceCreamMaker output and reports statistics on triangle reads vs normal reads.
Verbose Processing
icecreamgrader.sh in=icecream_reads.fq verbose=t
Runs analysis with detailed output messages showing processing stages and stream initialization.
Limited Read Processing
icecreamgrader.sh in=large_dataset.fq maxreads=10000
Processes only the first 10,000 reads for quick analysis of large datasets.
Algorithm Details
Triangle Read Detection
IceCreamGrader identifies "ice cream" or triangle reads by parsing the custom tab-delimited headers created by IceCreamMaker. The detection algorithm examines the subreads
field in the header:
- Triangle reads (ice cream): Reads with subreads > 1 are classified as problematic triangle reads
- Normal reads (good): Reads with subreads = 1 are classified as normal, high-quality reads
Header Format Analysis
The tool expects reads with headers containing tab-delimited metadata fields:
m1_2_3/zmw/start_stop passes=X.XX fullPasses=N subreads=N missing=N adapters=N errorRate=X.XXX
Key components analyzed:
- ZMW ID: Zero-mode waveguide identifier
- Movie coordinates: Start and stop positions in the movie
- Passes: Number of passes through the template
- Subreads: Critical field for triangle detection - values > 1 indicate triangle reads
- Missing segments: Count of missing template segments
- Adapters: Number of adapter sequences detected
- Error rate: Estimated error rate for the read
Processing Strategy
The tool uses a single-threaded processing approach with ConcurrentReadInputStream for sequential read processing:
- Memory allocation: Uses fixed 200MB heap space (-Xmx200m -Xms200m)
- Stream processing: ConcurrentReadInputStream.nextList() processes reads sequentially using ListNum<Read> without storing entire dataset in memory
- Header parsing: ReadBuilder.isIceCream() uses String.split("\t") to extract tab-delimited metadata fields
- Statistics accumulation: Maintains running long counters (goodReads, goodBases, badReads, badBases) incremented per read
Output Statistics
IceCreamGrader provides these specific statistics on the analyzed dataset:
- Processing summary: Total reads and bases processed with timing information
- Good reads: Count and percentage of normal reads (subreads = 1)
- Good bases: Total bases in good reads with percentage
- Bad reads: Count and percentage of triangle reads (subreads > 1)
- Bad bases: Total bases in triangle reads with percentage
Integration with IceCreamMaker
IceCreamGrader is designed as the quality assessment companion to IceCreamMaker:
- Complementary workflow: IceCreamMaker generates synthetic reads, IceCreamGrader evaluates their quality
- Header compatibility: Specifically parses the custom header format generated by IceCreamMaker
- Quality metrics: Provides feedback on the proportion of problematic triangle reads in generated datasets
- Performance validation: Enables assessment of IceCreamMaker parameter settings and their impact on read quality
Performance Characteristics
Memory Usage
- Heap space: Fixed 200MB allocation (-Xmx200m -Xms200m)
- Stream-based processing: ConcurrentReadInputStream prevents full dataset storage in memory
- Scalability: Memory usage independent of input file size
Processing Speed
- Single-threaded: Uses single processing thread for sequential header parsing
- I/O bound: Performance primarily limited by disk read speed
- String operations: Uses String.split() and Integer.parseInt() for header analysis
File Format Support
- FASTQ/FASTA: Standard sequence formats with custom headers
- Compression: Automatic handling of gzipped files
- Custom headers: Requires IceCreamMaker-generated header format
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org