SummarizeCrossblock

Script: summarizecrossblock.sh Package: driver Class: SummarizeCrossblock.java

Summarizes CrossBlock results. Used for testing and validating CrossBlock by aggregating statistics from multiple result files into a single summary table.

Basic Usage

summarizecrossblock.sh in=<input file> out=<output file>

This tool processes one or more CrossBlock result files and generates a summary table showing statistics for each input file including contig counts, base counts, and the number of contigs and bases that were discarded during CrossBlock processing.

Parameters

Parameters are organized by their function in the summarization process. All parameters from the shell script are documented below.

Standard parameters

in=<file>
A text file of files, or a comma-delimited list of files. Each is a path to results.txt output from CrossBlock. If providing a file of filenames, each line should contain one path. If providing a comma-delimited list, separate multiple result files with commas.
out=<file>
Output file for the summary. Will contain tab-delimited data with columns: filename, copies, contigs, contigsDiscarded, bases, basesDiscarded. If not specified, output goes to stdout.
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file. Default is false, meaning existing files will be protected. Set to true to allow overwriting.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default is 200m for this lightweight tool.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+. Useful for automated pipelines where clean failure handling is important.
-da
Disable assertions. Can provide minor performance improvement in production environments where assertion checking is not needed.

Examples

Basic Summary of Single CrossBlock Result

summarizecrossblock.sh in=crossblock_results.txt out=summary.txt

Processes a single CrossBlock result file and writes the summary to summary.txt.

Summary of Multiple Result Files

summarizecrossblock.sh in=results1.txt,results2.txt,results3.txt out=combined_summary.txt

Processes multiple CrossBlock result files specified as a comma-delimited list and creates a combined summary.

Batch Processing with File List

echo -e "sample1_results.txt\nsample2_results.txt\nsample3_results.txt" > filelist.txt
summarizecrossblock.sh in=filelist.txt out=batch_summary.txt

Creates a file listing multiple CrossBlock result files, then processes them all to create a batch summary.

Output to Console

summarizecrossblock.sh in=crossblock_results.txt

Processes the result file and prints the summary to standard output for immediate viewing or piping to other tools.

Output Format

The output is a tab-delimited table with the following columns:

If an error occurs processing a specific file, the output will show "ERROR" instead of numeric values for that file.

Algorithm Details

SummarizeCrossblock implements a straightforward aggregation algorithm for CrossBlock validation and testing:

Processing Strategy

Input File Format Requirements

CrossBlock result files must contain tab-delimited data where:

Memory Usage

This tool has minimal memory requirements since it processes result files sequentially without storing large data structures. The default 200MB memory allocation is sufficient for most use cases, even when processing hundreds of result files.

Performance Characteristics

Related Tools

This tool is part of the CrossBlock validation and testing workflow:

Support

For questions and support: