TaxServer

Script: taxserver.sh Package: tax Class: TaxServer.java

HTTP server for NCBI taxonomy translation and sketch-based sequence identification. Maintains reference sketches and taxonomy databases in memory to provide high-performance remote queries for taxonomic classification and phylogenetic analysis.

Overview

TaxServer is one of the four core BBSketch programs (sketch.sh, comparesketch.sh, sendsketch.sh, taxserver.sh). Its primary purpose is to eliminate the overhead of repeatedly loading large reference datasets by maintaining them in memory as a persistent HTTP service. This enables rapid sketch-based taxonomic identification for remote clients using tools like SendSketch.

The server provides two main services:

With a large reference set and single queries, comparison time is dominated by loading the reference database. TaxServer solves this by keeping references in memory, making individual queries extremely fast.

Basic Usage

Taxonomy Server

taxserver.sh tree=tree.taxtree.gz table=gitable.int1d.gz port=1234

Start basic taxonomy translation server.

Sketch Server with Reference Database

taxserver.sh port=1234 tree.taxtree.gz gi=gitable.int1d.gz refseq*.sketch 1>log.o 2>&1 &

Load RefSeq sketches into memory and listen for sketch comparison queries on port 1234.

After starting the server, clients can query it using SendSketch:

sendsketch.sh in=assembly.fa address=http://localhost:1234/sketch

Parameters

Core Parameters

tree=auto
Path to taxonomy tree file (taxtree format). Always necessary. Use "auto" for default location at LBL.
table=auto
Path to GI table file (gitable format). Necessary for GI number support. Also accepts "gi=" parameter.
accession=null
Comma-delimited accession file paths. Example: prot.accession2taxid.gz,nucl_wgs.accession2taxid.gz
img=null
IMG dump file path for IMG genome database support.
pattern=null
Pattern file for compressed accession storage.
port=3068
HTTP server port number. Default 3068 for taxonomy server.
domain=
Domain name displayed in help messages. Default: taxonomy.jgi-psf.org
dbname=
Database name displayed in responses and help messages.
taxpath=
Base path for taxonomy files when using "auto" parameters.

Sketch Parameters

sketchcomparethreads=16
Maximum comparison threads per connection for sketch operations.
sketchloadthreads=4
Maximum load threads for local fastq file processing.
sketchonly=f
Run in sketch-only mode, disabling taxonomy name hashing.
k=31
Kmer length (1-32). Dual lengths supported for sensitivity: k=31,24
prealloc=f
Preallocate data structures. Use boolean (true/false) or fraction (0.75) for partial allocation.

Security Parameters

killcode=
Password for remote server shutdown via /kill/password endpoint.
oldcode=
Password of prior server instance for cleanup operations.
oldaddress=
Address to kill prior instance after initialization. Example: taxonomy.jgi-psf.org/kill/
allowremotefileaccess=f
Allow external queries to access server filesystem for local sketching.
allowlocalhost=f
Treat localhost queries as internal without proxy requirements.
addressprefix=128.
IP prefix for internal network identification. Default "128." for LBL.

Java Parameters

-Xmx
Maximum Java memory allocation. Examples: -Xmx20g (20GB), -Xmx200m (200MB). Maximum typically 85% of physical memory.
-eoom
Exit on out-of-memory exception. Requires Java 8u92+.
-da
Disable Java assertions for performance.

Setting Up Sketch Servers

Setting up your own sketch server involves three main steps: preparing taxonomy files, creating reference sketches, and starting the server. The BBTools package includes pipeline scripts demonstrating this process.

1. Prepare Taxonomy Files

# Download and prepare taxonomy files (see fetchTaxonomy.sh)
# This creates tree.taxtree.gz, gitable.int1d.gz, and accession files

2. Create Reference Sketches

# Sketch a reference database (see fetchNt.sh, fetchRefSeq.sh)
sketch.sh in=refseq.fa.gz out=refseq#.sketch files=31 mode=taxa \
  tree=tree.taxtree.gz gi=gitable.int1d.gz taxlevel=subspecies

3. Start the Server

# Start server with taxonomy files and sketches
taxserver.sh -Xmx45g tree=tree.taxtree.gz gi=gitable.int1d.gz \
  accession=*.accession2taxid.gz refseq*.sketch port=1234 \
  domain=your.domain.org killcode=your_password

Important Configuration Note

When using custom taxonomy file locations (not at JGI), add the taxpath=X parameter to all BBTools commands, where X is the path containing your taxonomy files. This applies to fetchNt.sh, startNtServer.sh, and all sketch operations.

Examples

Basic Taxonomy Server

taxserver.sh tree=tree.taxtree.gz table=gitable.int1d.gz port=1234

Start taxonomy translation server with GI number support.

Full-Featured Server

taxserver.sh -Xmx45g tree=tree.taxtree.gz table=gitable.int1d.gz \
  accession=prot.accession2taxid.gz,nucl_wgs.accession2taxid.gz \
  port=1234 refseq*.sketch

Server with taxonomy translation, accession support, and sketch database.

LBL Configuration

taxserver.sh tree=auto table=auto accession=auto port=1234

Use default LBL file locations with auto-detection.

Custom Path Setup

taxserver.sh -Xmx45g tree=auto table=auto accession=auto \
  port=1234 taxpath=/custom/taxonomy/path refseq*.sketch

Auto-detection with custom taxonomy file directory.

Sketch-Only Mode

taxserver.sh tree=tree.taxtree.gz port=1234 sketchonly=t \
  k=31,24 refseq*.sketch

Sketch comparison only, with dual kmer lengths for enhanced sensitivity.

High-Performance Configuration

taxserver.sh -Xmx128g tree=auto table=auto accession=auto \
  port=1234 sketchcomparethreads=32 prealloc=0.8 \
  refseq*.sketch nt*.sketch

High-memory server with increased threading and preallocation for large datasets.

Client Usage

Once your TaxServer is running, clients can query it using SendSketch or direct HTTP requests:

SendSketch Queries

# Query your local server
sendsketch.sh in=assembly.fa address=http://localhost:1234/sketch

# Query with additional parameters
sendsketch.sh in=reads.fq address=http://your.server.org:1234/sketch \
  reads=1m samplerate=0.5 minkeycount=2

JGI Public Servers

# Use JGI's public servers (shorthand notation)
sendsketch.sh in=assembly.fa nt
sendsketch.sh in=assembly.fa refseq  
sendsketch.sh in=assembly.fa silva

# Equivalent full addresses:
# https://nt-sketch.jgi-psf.org/sketch
# https://refseq-sketch.jgi-psf.org/sketch  
# https://ribo-sketch.jgi-psf.org/sketch

Algorithm Details

Server Architecture

TaxServer implements a multi-threaded HTTP server using Java's HttpServer framework with four specialized handlers:

Request processing uses Executors.newFixedThreadPool() with configurable thread count (handlerThreads parameter, default max(2, CPU_cores)).

Memory-Resident Data Structures

The server loads and maintains several key data structures in memory:

Sketch Comparison Engine

The server's sketch functionality provides several comparison modes:

Thread management uses maxConcurrentSketchCompareThreads (default 16) for comparison operations and maxConcurrentSketchLoadThreads (default 4) for file I/O.

Performance Optimizations

Query Processing Pipeline

Each sketch query follows this processing sequence:

  1. Request parsing: Extract parameters and query mode from URL path
  2. Access control: Verify client permissions based on IP address and query type
  3. Sketch loading: Parse incoming sketch data or load from specified files
  4. Compatibility check: Verify kmer length and hash version match server configuration
  5. Database search: Compare query sketch(es) against reference database
  6. Result formatting: Generate response with similarity metrics and taxonomy information

API Endpoints

Taxonomy Translation

/tax/name/<organism_name>
Look up taxonomy by organism name (e.g., /tax/name/Escherichia%20coli)
/tax/taxid/<taxonomy_id>
Look up taxonomy by NCBI taxonomy ID (e.g., /tax/taxid/511145)
/tax/gi/<gi_number>
Look up taxonomy by GI number (e.g., /tax/gi/556503834)
/tax/accession/<accession>
Look up taxonomy by accession number (e.g., /tax/accession/NC_000913)
/stax/
Simple taxonomy queries returning only canonical taxonomic levels

Sketch Comparison

/sketch/
Submit sketch data for comparison against reference database (POST request)
/sketch/file/<filename>
Process local server files for sketching (internal clients only)
/sketch/ref/<taxid>
Compare against specific reference sketches by taxonomy ID

Server Management

/help
Display server usage documentation
/usage
Alias for /help endpoint
/stats
Server performance statistics and query metrics
/kill/<password>
Secure server shutdown (requires killcode parameter)

File Requirements

Essential Files

Optional Enhancement Files

File Preparation

Use the provided pipeline scripts for file preparation:

Performance Considerations

Memory Requirements

Threading Configuration

Startup Time

Troubleshooting

Common Issues

Port binding errors
Server waits up to 8 iterations with exponential backoff if port is busy
Out of memory errors
Increase -Xmx allocation or use prealloc parameter to optimize memory layout
Slow sketch loading
Verify sketch files are accessible and consider increasing sketchloadthreads
Access denied for file queries
Internal clients only: verify addressprefix matches client IP and allowRemoteFileAccess=true

Monitoring

Integration with BBSketch Ecosystem

TaxServer is designed to work seamlessly with other BBSketch tools:

The typical workflow involves using sketch.sh to create reference databases, TaxServer to serve them, and sendsketch.sh for queries. This architecture enables high-throughput taxonomic identification with minimal per-query overhead.

Notes

Support

For questions and support: