RenameBySketch

Script: renamebysketch.sh Package: bin Class: FileRenamer.java

Renames fasta files with a TaxID, based on SendSketch results. Designed for metagenome binning evaluation and synthetic read generation.

Basic Usage

renamebysketch.sh *.fa

Input may be fasta or fastq, compressed or uncompressed. Files will be renamed with the format: tid_[TAXID]_[original_filename]

Parameters

RenameBySketch uses standard Java parameters for memory management and execution control. The tool automatically processes all input files using SendSketch to determine taxonomic identity.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default: 4g
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Rename Multiple Assemblies

renamebysketch.sh bin1.fa bin2.fa bin3.fa

Processes each assembly file, identifies the top taxonomic hit using SendSketch, and renames files with their taxonomic ID. For example, bin1.fa might become tid_562_bin1.fa if identified as E. coli (TaxID 562).

Process All FASTA Files in Directory

renamebysketch.sh *.fa

Batch processes all .fa files in the current directory, renaming each based on its taxonomic identification.

With Custom Memory Settings

renamebysketch.sh -Xmx8g assembly1.fasta assembly2.fasta

Processes assemblies with 8GB of allocated memory, useful for large genome files that require more memory for sketch comparison.

Algorithm Details

FileRenamer implements a MinHash sketch-based taxonomic identification workflow that processes input files through SketchTool and SendSketch components for NCBI RefSeq database matching.

Processing Strategy

The implementation uses a four-stage processing pipeline with direct class method invocations:

JSON Response Processing

The SketchRecord constructor extracts specific fields from JsonObject responses using typed accessor methods:

File Processing Implementation

The main loop processes args[i] filenames sequentially with validation and error handling:

Use Cases

FileRenamer addresses specific bioinformatics workflow requirements:

Implementation Characteristics

The sketch-based approach exhibits specific computational and memory properties:

Output Format

Files are renamed using the pattern: tid_[TAXID]_[original_filename]

Example Transformations

Taxonomic ID Assignment

TaxIDs are assigned based on the top SendSketch hit:

Dependencies

RenameBySketch requires network access to function properly:

Support

For questions and support: