CladeServer

Script: cladeserver.sh Package: clade Class: CladeServer.java

Starts a CladeServer for taxonomic classification using QuickClade architecture. CladeServer is a high-performance HTTP server that loads a reference clade database once into memory and then handles multiple client requests efficiently. This server-based approach dramatically reduces memory requirements for clients and enables high-throughput taxonomic classification for multiple users or batch processing workflows. The server receives text-encoded Clade objects (NOT raw FASTA) from SendClade clients and performs fast k-mer frequency comparisons against the preloaded reference database. The server architecture separates database loading from query processing, allowing the expensive initialization to be done once while serving many classification requests quickly.

Basic Usage

cladeserver.sh ref=<file>

CladeServer requires a reference clade database file (typically a .spectra file) and starts an HTTP server that processes taxonomic classification requests. Results can be returned in human-readable format or tab-delimited machine format suitable for downstream analysis pipelines.

Parameters

Parameters control server configuration, security settings, and default processing behavior.

Server Parameters

port=3069: Server listening port. Choose an available port for the HTTP server. Default is 3069. Clients must specify this port when connecting to the server.
killcode=<string>: Security code for remote server shutdown. When specified, allows remote shutdown by accessing /kill/<killcode> endpoint. Without a kill code, the server can only be stopped locally. Choose a secure, unpredictable password.
localhost=t: Allow connections from localhost (127.0.0.1). Set to false to restrict localhost access in security-sensitive environments.
prefix=<string>: Required address prefix for client connections. Only clients connecting from IP addresses starting with this prefix will be allowed. Useful for restricting access to specific subnets or IP ranges, e.g., prefix=/10.0.0 or prefix=/192.168.1.
remotefileaccess=f: Allow remote file access through the server. When enabled, clients can potentially access files on the server filesystem. Keep disabled unless specifically needed for security.

Processing Parameters

ref=<file>: Reference clade database file (REQUIRED). Should be a .spectra file generated by CladeLoader or similar BBTools clade utilities. This database is loaded once at server startup and used for all subsequent taxonomic classifications. Large databases may require several minutes to load and significant memory.
in=<file>: Alternative to ref parameter. Input file for reference clade database.
hits=1: Default number of top taxonomic hits to return per query. Clients can override this parameter in their requests. More hits provide alternative classifications but increase response size and processing time.
heap=1: Default number of intermediate comparison results to store during processing. Higher values may improve accuracy for complex queries but increase memory usage. Clients can override this in individual requests.
format=human: Default output format. Options are 'human' for readable output with detailed information, or 'oneline'/'machine' for tab-delimited format suitable for parsing. Clients can specify format preferences in their requests.
banself=f: Default setting for banning self-matches. When true, ignores records with the same TaxID as the query, useful for accuracy testing. Clients can override this per request.
bandupes=f: Default setting for banning duplicate matches. When true, prevents the same reference from appearing multiple times, ensuring all hits represent distinct classifications.
printqtid=f: Default setting for printing query TaxIDs when present in sequence headers. Useful for benchmarking with labeled data containing taxonomic information in headers.

Verbose Parameters

verbose=f: Enable standard verbose logging. Shows request processing, timing information, and basic server statistics. Useful for monitoring server activity and performance.
verbose2=f: Enable detailed debug logging. Shows extensive debugging information including HTTP headers, request parsing details, and step-by-step processing. Generates significant log output; use only for debugging specific issues.

Java Parameters

-Xmx: Set Java's maximum memory usage. Default is 8GB (-Xmx8g) for CladeServer. Large custom databases may require additional memory. Memory is allocated once at startup and reused for all subsequent requests.
-Xms: Set Java's initial memory allocation. Default is 8GB (-Xms8g) matching -Xmx to avoid memory reallocation during operation.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Examples

Basic Server Startup

cladeserver.sh ref=refseqA48_with_ribo.spectra.gz

Start server on default port 3069 with standard reference database.

Custom Port with Kill Code

cladeserver.sh ref=refseqA48_with_ribo.spectra.gz port=3069 killcode=magical_girl_2025

Start server on port 3069 with a secure kill code for remote shutdown capability.

Verbose Monitoring

cladeserver.sh ref=refseqA48_with_ribo.spectra.gz verbose=t localhost=f

Start server with verbose logging enabled and localhost access disabled for security.

Custom Database with Large Memory

cladeserver.sh ref=my_custom_db.spectra.gz port=8080 heap=10 verbose2=t

Start server with custom database, non-standard port, increased heap size, and detailed debug logging.

Subnet-Restricted Access

cladeserver.sh ref=bacteria_only.spectra.gz port=3069 prefix=/10.0.0

Start server restricted to clients from the 10.0.0.* subnet for network security.

Server Architecture

HTTP Server Infrastructure

CladeServer uses Java HTTP server infrastructure to handle concurrent requests efficiently. The server creates separate handlers for different endpoints:

/clade: Main classification endpoint for processing taxonomic queries
/kill: Secure shutdown endpoint (requires kill code)
/stats: Server statistics including uptime and query counts
/: Help information and usage guidance

Supported Request Types

CladeServer supports multiple request formats for different use cases:

Standard Clade format: Complete k-mer frequency profiles for taxonomic classification
PreClade format: Privacy-preserving classification with compressed k-mer counts
FetchClade: Fetch Clade file by taxID or organism name (planned feature)
FetchSSU: Fetch 16S/18S sequences by taxID or organism name (planned feature)
CompareSSU: Align query SSU to references (planned feature)

Server Endpoints

Available HTTP endpoints for server interaction:

POST /clade: Main classification endpoint for submitting taxonomic queries
GET /kill/<code>: Shutdown server (requires kill code specified at startup)
GET /stats: Server statistics and uptime information
GET /: Usage help and server information

Remote Shutdown Process

Start server with killcode: cladeserver.sh ref=db.spectra killcode=secret123
Shutdown via HTTP: curl http://server:port/kill/secret123

Memory and Performance

Memory Requirements

Server memory usage depends primarily on reference database size. Typical requirements range from 4-16GB for standard databases. The default memory allocation is 8GB (-Xmx8g -Xms8g). Large custom databases may require additional memory. Memory is allocated once at startup and reused for all subsequent requests.

Performance Characteristics

Database loading occurs once at startup and may take several minutes for large references. Once loaded, individual queries are processed quickly. The server is designed for high-throughput scenarios where many classification requests need to be processed efficiently. Concurrent requests are handled safely with thread-safe data structures.

Processing Pipeline

Startup: Load reference database into memory (one-time cost)
Request Reception: Receive Clade/PreClade data from client via HTTP POST
Security Check: Verify client IP address against prefix restrictions
Format Detection: Automatically detect Clade vs PreClade format
K-mer Comparison: Compare query k-mer frequencies against reference database
Result Formatting: Format results in requested output format (human/machine)
Response: Return classification results to client

Security Considerations

Kill Code: Use killcode parameter for secure remote shutdown capability. Choose a strong, unpredictable password.
Access Control: Configure localhost and prefix parameters to restrict access appropriately. Default allows localhost connections.
File Access: Keep remotefileaccess=false unless specifically required. Enabling this can expose server filesystem.
Port Selection: Choose non-standard ports for production deployments to reduce unauthorized access attempts.
Logging: Monitor logs (verbose mode) for unauthorized access attempts and unusual activity patterns.
Network Security: Use prefix parameter to restrict access to trusted subnets or IP ranges.

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org