RunHMM

Script: runhmm.sh Package: hmm Class: HMMSearchReport.java

Parses HMMER search output files using manual byte array processing, extracting 23 fields per hit line into HMMSearchLine objects. Groups hits by query protein name in HashMap<String, ProteinSummary> structure, retaining only the longest hit length per protein-model combination for memory efficiency.

Basic Usage

runhmm.sh in=<file> out=<file>

Reads HMM search output files line-by-line using ByteFile.nextLine(), parsing each non-comment line into HMMSearchLine objects with manual byte array processing. Each line is parsed into 23 fields using pointer advancement (a,b variables) with space delimiter detection and Parse method conversion for integers, floats, and doubles.

Parameters

RunHMM has minimal parameters as it focuses on standard HMM file processing with consistent output formatting.

File I/O Parameters

in=<file>
Input HMM search results file. Should be in standard HMMER output format with tab-delimited fields containing protein names, model information, coordinates, and scores.
out=<file>
Output file for processed results. If not specified, results are written to stdout.

Processing Parameters

ow=f
(overwrite) Set to true to overwrite existing output files. Default: false.
verbose=f
Enable verbose output for debugging and detailed processing information. Default: false.

Examples

Basic HMM Results Processing

runhmm.sh in=hmmsearch_output.txt out=processed_results.txt

Processes HMM search results from a HMMER output file, creating organized protein summaries.

Processing with Overwrite

runhmm.sh in=domain_search.out out=summary.txt ow=t

Processes domain search results, overwriting any existing output file.

Standard Pipeline Usage

# First run HMM search (external tool)
hmmsearch protein_models.hmm query_sequences.faa > search_results.txt

# Then process results with runhmm
runhmm.sh in=search_results.txt out=organized_hits.txt

Typical workflow showing HMM search followed by results processing.

Algorithm Details

HMM Search Line Parsing

RunHMM implements manual byte-by-byte parsing of HMMER output format using pointer advancement (a,b variables) through space-delimited fields, extracting 23 distinct fields from each hit line with Parse.parseInt(), Parse.parseDouble(), and Parse.parseFloat() methods:

Data Organization Strategy

The tool uses HashMap-based storage with length-based filtering:

Field Processing

Each input line is parsed into structured components using specific Parse methods:

Output Generation

Results are output via System.err.println() calls during processing:

Performance Characteristics

Input Format

RunHMM expects standard HMMER output format with the following characteristics:

Expected Field Structure

Each data line should contain fields in this order:

  1. Query protein name
  2. Model identifier
  3. Sequence length
  4. HMM model name
  5. Accession number
  6. Model length
  7. Full sequence E-value
  8. Full sequence score
  9. Full sequence bias
  10. Best domain number
  11. Domain count
  12. Domain E-value
  13. Independent E-value
  14. Domain score
  15. Domain bias
  16. HMM start coordinate
  17. HMM end coordinate
  18. Query start coordinate
  19. Query end coordinate
  20. Envelope start
  21. Envelope end
  22. Accuracy score
  23. Description field

Output Format

The tool generates organized summaries showing:

Technical Notes

Memory Management

RunHMM uses HashMap-based data structures for hit tracking:

File Processing

Integration with HMM Workflows

This tool is designed to complement standard HMMER workflows:

Support

For questions and support: