InvertKey

Script: invertkey.sh Package: sketch Class: InvertKey.java

Inverts a sketch key, given a matching reference. This tool takes a reference sequence and a sketch key (or set of keys) and finds the k-mer sequences that generated those hash values.

Basic Usage

invertkey.sh in=<reference> key=<key> k=<31>

The reference file must contain the sequences that the sketch keys were generated from. The k-mer length must match the k used to generate the original sketch.

Parameters

Parameters are organized by their function in the key inversion process. All parameters from the shell script usage are documented below.

I/O parameters

in=<file>
Input reference file containing sequences to search for k-mers. Required parameter.
key=<key>
Sketch key or comma-separated list of keys to invert. Can also be a .sketch file. Required parameter.
k=<31>
K-mer length for matching. Must match the k value used to generate the original sketch. Default: 31
out=<file>
Output file for inverted k-mers. Default: stdout.fa
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file.

Processing parameters

verbose=f
Enable verbose output for debugging and monitoring progress.
printonce=t
When true, stops after finding the first occurrence of each key. When false, reports all occurrences.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Basic Key Inversion

invertkey.sh in=reference.fasta key=A12B34C56D78 k=31

Inverts a single sketch key using a reference FASTA file with k=31.

Multiple Key Inversion

invertkey.sh in=reference.fasta key=A12B34C56D78,X98Y76Z54W32 k=31 out=inverted.fasta

Inverts multiple comma-separated sketch keys and outputs results to a file.

From Sketch File

invertkey.sh in=reference.fasta key=sample.sketch k=31 printonce=f

Inverts all keys from a sketch file and reports all occurrences (not just the first).

Algorithm Details

InvertKey performs reverse lookup of sketch hash values to recover the original k-mer sequences through a linear scanning process implemented in the invert() method.

Key Processing Strategy

K-mer Scanning Process

Output Formatting

Performance Characteristics

Technical Implementation

The k-mer rolling hash maintains 64-bit representations for both forward (kmer) and reverse complement (rkmer) sequences. Base encoding uses AminoAcid.baseToNumber arrays for 2-bit per base compression. Invalid bases (N, ambiguous) reset length counter and clear reverse k-mer state. The hash function applies the same transformation used in sketch generation (SketchObject.hash()) ensuring identical hash values for matching k-mers. The algorithm supports k-mer lengths up to 32 bases due to 64-bit integer constraints.

Support

For questions and support: