GradeMerge

Script: grademerge.sh Package: jgi Class: GradeMergedReads.java

Grades correctness of merging synthetic reads with headers generated by RandomReads and re-headered by RenameReads.

Basic Usage

grademerge.sh in=<file>

GradeMerge evaluates the quality of read merging by comparing merged synthetic reads against their known insert sizes. This tool is specifically designed for synthetic reads created with RandomReads that contain insert size information in their headers.

Parameters

GradeMerge accepts input files and optional raw reads for dual-stage merging analysis.

Input/Output Parameters

in=<file>
Specify the input file containing merged reads, or 'stdin'. The reads must have synthetic headers containing insert size information (e.g., "insert=250") for proper grading.
raw=<file>
Specify the original raw read pairs before merging. This allows calculation of what percentage of reads were theoretically mergeable. Use # symbol for paired files (raw=reads#.fq becomes reads1.fq and reads2.fq).
raw1=<file>
First file of raw paired reads. Use with raw2 to specify paired files explicitly.
raw2=<file>
Second file of raw paired reads. Used in conjunction with raw1.

Processing Parameters

verbose=f
Print additional processing information during execution. Set to true for detailed output about file processing stages.

Examples

Basic Merge Grading

grademerge.sh in=merged_reads.fq

Grades the correctness of merged reads by comparing actual merged length against the insert size embedded in synthetic read headers.

Dual-Stage Analysis with Raw Reads

grademerge.sh in=merged_reads.fq raw=raw_reads#.fq

Analyzes merged reads and also reports what percentage of the original raw read pairs were theoretically mergeable based on their insert sizes.

Verbose Processing

grademerge.sh in=merged_reads.fq raw1=reads_1.fq raw2=reads_2.fq verbose=t

Performs dual-stage analysis with detailed processing information printed to stderr.

Algorithm Details

Merge Quality Assessment

GradeMerge implements delta-based comparison logic to evaluate read merging accuracy using synthetic read headers containing known insert sizes:

Header Parsing Strategy

Quality Classification

Each merged read is classified based on the comparison between its actual length and the expected insert size:

Statistical Analysis

The tool computes statistics using counters for each classification category:

Dual Processing Strategy

When raw reads are provided, GradeMerge performs two-stage analysis using ConcurrentReadInputStream:

  1. Raw Analysis: Determines which read pairs could theoretically be merged based on insert size < r1.pairLength() condition
  2. Merged Analysis: Evaluates the quality of actually merged reads using delta=insert-initialLength1 calculation

Performance Characteristics

Output Metrics

GradeMerge provides detailed statistics including:

Support

For questions and support: