BloomFilterParser

Script: bloomfilterparser.sh Package: bloom Class: ParseBloomFilter.java

Parses verbose output from bloomfilter.sh for a specific paper. Irrelevant for most people, but useful for reproducing published results. You use it to parse output from bloomfilter.sh and tabulate it.

Basic Usage

bloomfilterparser.sh in=<input file> out=<output file>

The input file should be whatever bloomfilter.sh prints to the screen (e.g., in=slurm-3249652.out out=summary.txt). You get details of calls to increment() if you add the verbose flag to bloomfilter.sh.

Parameters

This tool has a limited set of parameters focused on parsing bloomfilter output.

Input/Output Parameters

in=file
Input file containing verbose output from bloomfilter.sh. This should be a text file with the screen output captured from bloomfilter.sh execution.
out=file
Output file for the parsed and tabulated results. Default is "stdout.txt" if not specified.
invalid=file
Optional output file for invalid lines that couldn't be parsed. Lines that don't match expected patterns are written here.

Processing Parameters

lines=<integer>
Maximum number of lines to process. If negative or not specified, processes all lines in the input file.
verbose=<boolean>
Enable verbose output during parsing. Shows detailed processing information. Default: false

Standard Parameters

overwrite=true
Allow overwriting of existing output files.
append=false
Append to existing output files instead of overwriting.

Examples

Basic Parsing

bloomfilterparser.sh in=slurm-3249652.out out=summary.txt

Parses the output from a SLURM job that ran bloomfilter.sh and creates a summary table of the results.

With Invalid Line Capture

bloomfilterparser.sh in=bloomfilter_output.txt out=parsed_results.txt invalid=unparsed_lines.txt

Parses bloomfilter output while capturing any lines that couldn't be parsed into a separate file for review.

Limited Line Processing

bloomfilterparser.sh in=large_output.txt out=results.txt lines=1000

Only processes the first 1000 lines of the input file, useful for testing or when dealing with very large output files.

Algorithm Details

BloomFilterParser is a specialized text processing tool designed to extract structured data from bloomfilter.sh verbose output for research reproducibility.

Parsing Strategy

The parser uses pattern matching to identify and extract specific types of information from bloomfilter.sh output:

Output Format

The tool converts verbose bloomfilter output into a tabular format suitable for analysis:

Memory Usage

The parser uses minimal memory (default 300MB) and processes files line by line, making it suitable for large bloomfilter output files. Memory usage is configured via standard Java heap parameters.

Use Case

This tool is specifically designed for researchers who need to reproduce published results involving bloom filters. It was created for a specific paper and extracts exactly the metrics needed for that research. While not generally useful, it demonstrates how to systematically parse complex bioinformatics tool output for downstream analysis.

Support

For questions and support: