PlotReadPosition

Script: plotreadposition.sh Package: hiseq Class: PlotReadPosition.java

Plots Illumina read positions and barcode hamming distance by analyzing read headers and calculating distances to expected barcodes.

Basic Usage

plotreadposition.sh in= out= expected=

This tool processes Illumina FASTQ files to extract positional information and barcode data from read headers, then calculates hamming distances to expected barcodes.

Parameters

Input/Output Parameters

File Parameters

in=<file>
Input FASTQ file containing Illumina reads with positional and barcode information in headers. Can be compressed (gzip).
out=<file>
Output TSV file with three columns: x position, y position, and hamming distance to closest expected barcode.
expected=<file>
names=<file>
barcodes=<file>
File containing expected barcode sequences, one per line. Used to calculate hamming distances from observed barcodes in read headers.

Processing Parameters

Read Processing

maxreads=-1
Maximum number of reads to process. Default (-1) processes all reads in the input file.

Examples

Basic Position and Barcode Analysis

plotreadposition.sh in=sample.fq out=positions.tsv expected=barcodes.txt

Analyzes read positions and calculates barcode hamming distances for all reads in sample.fq.

Limited Read Processing

plotreadposition.sh in=large_sample.fq out=subset_positions.tsv expected=barcodes.txt maxreads=1000000

Processes only the first 1 million reads from a large FASTQ file.

Compressed Input

plotreadposition.sh in=sample.fq.gz out=positions.tsv expected=expected_barcodes.txt

Processes compressed FASTQ input, automatically detecting gzip format.

Output Format

The output TSV file contains three tab-separated columns with a header line:

x	y	hdist
1001	1523	0
1002	1523	2
1003	1523	1
...

Algorithm Details

Header Parsing Strategy

PlotReadPosition uses IlluminaHeaderParser2 to extract structured information from Illumina FASTQ headers. The tool specifically targets:

Barcode Distance Calculation

The tool employs PCRMatrixHDist for barcode matching:

Processing Architecture

The implementation uses concurrent stream processing:

Data Structure Usage

PlotReadPosition utilizes specialized data structures for performance:

Performance Characteristics

Use Cases

Technical Notes

Support

For questions and support: