TileDump
Processes a tile dump from FilterByTile. This tool can modify tile dimensions, apply quality filters, and write processed tile data for downstream analysis.
Basic Usage
tiledump.sh in=<input file> out=<output file>
Reads a tile dump file, processes it according to the specified parameters, and writes the result to an output file.
Parameters
Parameters are organized by their function in the tile processing workflow.
Standard parameters
- in=<file>
- Input dump file containing tile data to be processed.
- out=<file>
- Output dump file where processed tile data will be written.
- overwrite=t
- (ow) Set to false to force the program to abort rather than overwrite an existing file.
Processing parameters
- x=-1
- Widen tiles to at least this X width. Default -1 (no widening).
- y=-1
- Widen tiles to at least this Y width. Default -1 (no widening).
- reads=-1
- Widen tiles to at least this average number of reads. Default -1 (no widening based on reads).
- alignedreads=250
- Average aligned reads per tile for error rate calibration. Used for statistical calculations.
- verbose=f
- Set to true to print verbose output during processing.
- blur=f
- (blurtiles, smoothtiles) Set to true to blur/smooth tiles during processing.
Quality Threshold Parameters
- qdeviations=2.4
- (qd) Number of standard deviations for quality thresholds. Default 2.4.
- udeviations=1.5
- (ud) Number of standard deviations for uniqueness thresholds. Default 1.5.
- edeviations=3.0
- (ed) Number of standard deviations for error-free thresholds. Default 3.0.
- pgdeviations=1.4
- (pgd) Number of standard deviations for poly-G thresholds. Default 1.4.
Fraction Threshold Parameters
- qfraction=0.08
- (qf) Quality fraction threshold for tile filtering. Default 0.08.
- ufraction=0.01
- (uf) Uniqueness fraction threshold for tile filtering. Default 0.01.
- efraction=0.2
- (ef) Error-free fraction threshold for tile filtering. Default 0.2.
- pgfraction=0.2
- (pgf) Poly-G fraction threshold for tile filtering. Default 0.2.
Absolute Threshold Parameters
- qabsolute=2.0
- (qa) Absolute quality threshold for tile filtering. Default 2.0.
- uabsolute=1.0
- (ua) Absolute uniqueness threshold for tile filtering. Default 1.0.
- eabsolute=6.0
- (ea) Absolute error-free threshold for tile filtering. Default 6.0.
- pgabsolute=0.2
- (pga) Absolute poly-G threshold for tile filtering. Default 0.2.
Filtering Parameters
- maxbadfraction=0.4
- (mbf, mdf, maxdiscardfraction) Maximum fraction of tiles to discard. Default 0.4.
- impliederrorrate=0.012
- (inferrederrorrate, ier, maxier) Maximum implied error rate threshold. Default 0.012.
- inferredquality=
- (impliedquality, miniq) Minimum inferred quality score. Converted to error rate using Phred scale.
Java Parameters
- -Xmx
- This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
- -eoom
- This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Examples
Basic tile dump processing
tiledump.sh in=tiles.dump out=processed_tiles.dump
Process a tile dump file with default parameters.
Widen tiles by dimensions
tiledump.sh in=tiles.dump out=widened_tiles.dump x=100 y=100
Widen tiles to at least 100x100 pixels in each dimension.
Filter tiles with custom thresholds
tiledump.sh in=tiles.dump out=filtered_tiles.dump \
qdeviations=3.0 udeviations=2.0 maxbadfraction=0.3
Apply stricter quality filtering and limit discarded tiles to 30%.
Process with verbose output and blurring
tiledump.sh in=tiles.dump out=smooth_tiles.dump \
verbose=t blur=t alignedreads=500
Process tiles with verbose output, apply smoothing, and use higher aligned read threshold.
Algorithm Details
Tile Processing Workflow
TileDump implements a multi-stage tile processing and filtering system using FlowCell and MicroTile data structures for Illumina sequencing data:
1. Tile Widening Strategy
- Dimension-based widening: Expands tiles to meet minimum X and Y size requirements
- Read-based widening: Adjusts tile boundaries to achieve target read counts per tile
- Aligned read calibration: Temporarily widens tiles for error rate calibration when aligned read counts are sufficient
2. Statistical Analysis
The tool calculates comprehensive statistics for each micro-tile including:
- Quality metrics: average quality, error rates, alignment rates
- Uniqueness measures: k-mer uniqueness percentages
- Base composition: ACGTN content, homo-polymer analysis
- Error modeling: read and base error rates, implied error calculations
3. Multi-criteria Filtering
Tiles are evaluated using multiple independent criteria:
- Quality filtering: Based on deviation from mean quality scores
- Uniqueness filtering: Removes tiles with excessive repetitive sequences
- Error-free filtering: Identifies tiles with abnormal error patterns
- Poly-G detection: Flags tiles with excessive poly-G runs
- Count-based filtering: Removes tiles with insufficient reads for statistics
4. Adaptive Threshold Management
The filtering system uses a three-tier threshold approach:
- Deviation thresholds: Based on standard deviations from flow cell mean
- Fraction thresholds: Proportional to flow cell average values
- Absolute thresholds: Hard minimum/maximum values
5. Maximum Discard Protection
To prevent over-filtering, the system:
- Limits total discarded tiles to maxbadfraction (default 40%)
- Prioritizes retention of tiles with adequate read counts
- Uses sorting to retain the best tiles when limits are exceeded
6. Error Rate Modeling
Advanced error rate prediction using:
- Linear regression models relating uniqueness to error rates
- Separate models for read-level and base-level errors
- Calibration using tiles with sufficient aligned reads
Performance Characteristics
- Memory usage: Scales with flow cell size and tile count
- Processing speed: Linear with number of micro-tiles
- I/O efficiency: Streaming processing for large dump files
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org