Start RefSeq Server VM
Server startup script for launching a RefSeq taxonomy server with sketch-based taxonomic identification capabilities. This script is designed to run on JGI's web infrastructure (jgi-web-4) and provides taxonomic classification services via HTTP API.
Overview
The startRefseqServerVM.sh script launches a high-memory taxonomy server specifically configured for RefSeq database operations. It uses BBTools' taxserver.sh with optimized parameters for sketch-based taxonomic identification using dual k-mer lengths (k=32,24) and includes security features for remote management.
Server Configuration
Hardware Requirements
- Memory: 28GB RAM allocated to Java heap (-Xmx28g)
- Storage: Access to RefSeq database files
- Network: Port 3072 for HTTP service
- Platform: JGI web infrastructure (jgi-web-4)
Service Parameters
Parameter | Value | Description |
---|---|---|
Port | 3072 | HTTP service port for API access |
Domain | https://refseq-sketch.jgi.doe.gov | Public domain for service access |
Database | RefSeq | RefSeq taxonomic database |
K-mer lengths | k=32,24 | Dual k-mer strategy for sensitivity/specificity |
Memory allocation | 28GB | Java heap size with 90% preallocation |
Security Features
Remote Management
The server includes security features for remote administration:
- Kill Code: Password-protected remote shutdown capability
- Kill URL: https://refseq-sketch.jgi.doe.gov/kill/ for administrative access
- Old Instance Handling: Automatically terminates previous server instances
Server Launch Command
Production Configuration
nohup /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/taxserver.sh \
-da -Xmx28g \
prealloc=0.9 \
port=3072 \
verbose \
tree=auto \
sizemult=2 \
sketchonly \
index \
domain=https://refseq-sketch.jgi.doe.gov \
killcode=xxxxx \
oldcode=xxxxx \
oldaddress=https://refseq-sketch.jgi.doe.gov/kill/ \
RefSeq \
k=32,24 \
1>>refseqlogVM_32.txt 2>&1 &
Parameter Explanation
- -da: Disable Java assertions for production
- -Xmx28g: Allocate 28GB to Java heap
- prealloc=0.9: Preallocate 90% of memory structures
- verbose: Enable detailed logging
- tree=auto: Automatically locate taxonomy tree files
- sizemult=2: Size multiplier for hash tables
- sketchonly: Restrict to sketch-based operations only
- index: Build index for faster queries
- nohup: Run in background, immune to hangups
- &: Run as background process
Testing Configuration
The script includes a simplified testing configuration (commented out by default):
# Simple mode for testing:
nohup /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/taxserver.sh \
-ea -Xmx28g \
port=3072 \
verbose \
tree=auto \
sizemult=2 \
sketchonly \
RefSeq \
k=32,24 \
index=t
Testing vs Production Differences
- -ea vs -da: Enable vs disable assertions
- No security parameters: No kill codes or domain settings
- index=t: Explicit index building flag
- No preallocation: Simplified memory management
- No logging redirection: Output to console
Service Capabilities
Sketch-Based Taxonomic Identification
The server provides high-performance taxonomic classification using:
- Dual K-mer Strategy: Uses k=32 and k=24 for optimal sensitivity/specificity balance
- RefSeq Database: Complete RefSeq taxonomic database coverage
- Memory-Optimized: 28GB allocation with preallocation for fast response
- HTTP API: RESTful interface for taxonomic queries
Expected Use Cases
- Real-time taxonomic classification of sequences
- Batch processing of metagenomic samples
- Quality control and contamination detection
- Taxonomic profiling of environmental samples
- Integration with bioinformatics pipelines
Monitoring and Maintenance
Log Files
- refseqlogVM_32.txt: Primary server log with timestamps
- Standard Output/Error: Redirected to log file
- Process Monitoring: Background process management via nohup
Server Management
- Status Checking: Monitor via HTTP health checks
- Remote Shutdown: Use kill URL with proper authentication
- Process Monitoring: Check system processes for java/taxserver
- Log Rotation: Manage log file growth over time
Prerequisites
System Requirements
- JGI infrastructure access (jgi-web-4)
- Java Runtime Environment (Java 8 or later)
- BBTools installation at specified path
- RefSeq database files in accessible location
- Sufficient memory (32GB+ recommended)
- Network access to port 3072
File Dependencies
- taxserver.sh: BBTools taxonomy server script
- RefSeq database files: Taxonomy trees and tables
- BBTools libraries: Required Java classes and resources
- Configuration files: Auto-detected taxonomy resources
Usage Examples
Starting the Server
# Navigate to the script directory
cd /path/to/pipelines/server/
# Launch the RefSeq server
bash startRefseqServerVM.sh
# Verify server is running
ps aux | grep taxserver
# Check the log for startup messages
tail -f refseqlogVM_32.txt
Testing Server Response
# Basic health check
curl https://refseq-sketch.jgi.doe.gov
# Test taxonomic query (example)
curl -X POST https://refseq-sketch.jgi.doe.gov/query \
-d "sequence=ATCGATCGATCG..."
Server Shutdown
# Remote shutdown (requires password)
curl https://refseq-sketch.jgi.doe.gov/kill/[password]
# Or kill the process directly
pkill -f "taxserver.sh.*RefSeq"
Troubleshooting
Common Issues
- Port conflicts: Ensure port 3072 is available
- Memory errors: Verify sufficient RAM (28GB+ free)
- File permissions: Check access to database files
- Network issues: Verify domain resolution and routing
- Previous instances: Kill old servers before starting new ones
Performance Monitoring
- Monitor memory usage with
top
orhtop
- Check network connections with
netstat -an | grep 3072
- Review server logs for error messages or warnings
- Monitor response times for taxonomic queries
Related Tools
- taxserver.sh: Core taxonomy server implementation
- sendsketch.sh: Client for querying taxonomy servers
- comparesketch.sh: Local sketch comparison tool
- sketch.sh: Generate sketches from sequences