ProcessFrag

Script: processfrag.sh Package: driver Class: ProcessFragMerging.java

Reformats output from a script. Made for generating the BBMerge paper data.

Basic Usage

processfrags.sh <file>

Takes a single input file containing script output to be reformatted. This tool was specifically designed for processing and collating data used in the BBMerge research paper.

Parameters

ProcessFrag is a specialized utility with minimal parameters. It primarily uses standard Java runtime parameters for memory management.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 100m
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Basic Usage

# Process BBMerge comparison output
processfrags.sh comparison_results.txt

# Process with custom memory allocation
processfrags.sh -Xmx2g comparison_results.txt

The first command processes a file containing BBMerge comparison results. The second example allocates 2GB of memory for processing larger datasets.

Algorithm Details

Data Processing Strategy

ProcessFrag implements a line-by-line text processing algorithm specifically designed for reformatting BBMerge comparison output into a structured tabular format. The tool uses pattern matching to extract key metrics from various types of output lines:

Extracted Metrics

Output Format

The algorithm produces tab-delimited output with the following structure:

Processing Characteristics

Research Application

This tool was specifically developed for the BBMerge research paper to standardize the format of comparison data across multiple alignment tools and parameter sets. The consistent tabular output facilitates statistical analysis and visualization of tool performance metrics.

Input Format

ProcessFrag expects input files containing specific line patterns from BBMerge comparison scripts:

Output Format

The tool generates tab-delimited output suitable for spreadsheet import or further statistical analysis. Each dataset produces one row with the following columns:

  1. Dataset name
  2. Processing time (seconds)
  3. Reads used count
  4. Reads used percentage
  5. Mapped count
  6. Mapped percentage
  7. Overall error rate percentage
  8. Overall error count
  9. Substitution rate percentage
  10. Substitution count
  11. Deletion rate percentage
  12. Deletion count
  13. Insertion rate percentage
  14. Insertion count

Support

For questions and support: