StartNtServerVM
Production server startup script for the BBTools taxonomy server configured with the NT (nucleotide) database. This script launches a high-memory taxonomy server instance designed to handle remote sketch-based taxonomic classification requests through web services at JGI.
Overview
The startNtServerVM.sh script is a specialized configuration launcher for the BBTools taxonomy server (taxserver.sh) that:
- Configures the server to use the NT (NCBI nucleotide) database for taxonomic classification
- Sets up high-memory allocation (9GB) for production workloads
- Enables sketch-only mode for fast k-mer based taxonomic identification
- Provides remote management capabilities with kill codes
- Runs as a background daemon with logging
Prerequisites
System Requirements
- BBTools installation with taxserver.sh available
- Minimum 9GB RAM allocation for Java heap
- NT database files and taxonomy tree data
- Network connectivity for web service operation
- JGI production server environment (jgi-web-4)
Required Files
- NT database sketches and taxonomy data
- BBTools taxonomy server (taxserver.sh)
- Java runtime environment
Configuration Parameters
The script uses the following hardcoded configuration:
Server Configuration
- LOG=ntlogVM_32.txt
- Log file for server output and error messages
- PORT=3071
- Network port for the taxonomy server to listen on
- DB=nt
- Database identifier specifying the NT (nucleotide) database
- DOMAIN=https://nt-sketch.jgi.doe.gov
- Web domain for the service, used in help messages and API documentation
Security Configuration
- PASS=xxxxx
- Kill code password for remote server management (placeholder - must be set to actual password)
- KILL=https://nt-sketch.jgi.doe.gov/kill/
- URL endpoint for remote server termination
K-mer Configuration
- k=32,24
- Dual k-mer lengths used for sketch generation and comparison. Uses 32-mers for specificity and 24-mers for sensitivity
Server Launch Parameters
The script launches taxserver.sh with the following configuration:
Java Virtual Machine Settings
- -da
- Disable Java assertions for production performance
- -Xmx9g
- Set maximum Java heap size to 9 gigabytes
Taxonomy Server Settings
- port=$PORT
- Network port (3071) for HTTP service
- verbose
- Enable verbose logging for debugging and monitoring
- tree=auto
- Automatically locate taxonomy tree files
- sketchonly
- Enable sketch-only mode for fast k-mer based classification without full taxonomy name hashing
- index
- Build or load database index for faster queries
- domain=$DOMAIN
- Set the service domain for API documentation
- killcode=$PASS
- Set remote kill password for server management
- oldcode=$PASS
- Password for terminating any existing server instance
- oldaddress=$KILL
- URL to send termination request to existing server
Usage
Production Launch
# Run on jgi-web-4 server
bash startNtServerVM.sh
Testing Mode
The script includes a commented simple mode for testing:
# Uncomment and modify for testing
/global/projectb/sandbox/gaag/bbtools/jgi-bbtools/taxserver.sh \
-ea -Xmx9g port=3071 verbose tree=auto sketchonly nt k=32,24 index=f
Server Management
# Check if server is running
curl https://nt-sketch.jgi.doe.gov/
# Kill server remotely (requires password)
curl https://nt-sketch.jgi.doe.gov/kill/[password]
# Monitor server logs
tail -f ntlogVM_32.txt
Service Architecture
Daemon Process
The server runs as a background daemon using nohup with the following characteristics:
- Background execution: Uses & to run in background
- Log redirection: Both stdout and stderr redirected to log file
- Persistent operation: Continues running after shell termination
- Append logging: Logs append to existing log file (>>)
Web Service API
Once launched, the server provides REST API endpoints for:
- Taxonomic classification of sequences
- Sketch-based similarity comparisons
- Batch processing of FASTA/FASTQ files
- Database information queries
- Server health and status checks
Performance Characteristics
Memory Usage
- Java Heap: 9GB maximum allocation
- Database Loading: NT database requires significant memory for sketches
- Query Processing: Additional memory used during active queries
Processing Capability
- K-mer Strategy: Dual k-mer lengths (32,24) balance speed and accuracy
- Sketch Mode: Faster than full taxonomy name resolution
- Concurrent Queries: Handles multiple simultaneous classification requests
- Database Coverage: NT database provides comprehensive nucleotide sequence coverage
Monitoring and Troubleshooting
Log Analysis
# Monitor real-time activity
tail -f ntlogVM_32.txt
# Check for errors
grep -i error ntlogVM_32.txt
# Monitor memory usage
grep -i "memory\|heap\|gc" ntlogVM_32.txt
Common Issues
- Out of Memory: Increase -Xmx parameter if needed
- Port Conflicts: Ensure port 3071 is available
- Database Loading: Verify NT database files are accessible
- Network Issues: Check firewall and network connectivity
Health Checks
# Test server responsiveness
curl -s https://nt-sketch.jgi.doe.gov/ | head
# Check process status
ps aux | grep taxserver
# Monitor port usage
netstat -tuln | grep 3071
Security Considerations
Access Control
- Kill Code: Set strong password for PASS variable
- Network Security: Firewall rules should restrict access appropriately
- Service Isolation: Run with appropriate user privileges
Configuration Security
Integration with BBTools Ecosystem
Related Tools
- taxserver.sh: The underlying taxonomy server implementation
- sendsketch.sh: Client tool for sending queries to the server
- comparesketch.sh: Local sketch comparison functionality
- sketch.sh: Sketch generation tool
JGI Infrastructure
This script is part of JGI's web service infrastructure providing:
- Public taxonomic classification services
- Integration with JGI analysis pipelines
- Support for research community sequence analysis
Production vs Testing Modes
Production Mode (Default)
- Background daemon execution with nohup
- Full logging to ntlogVM_32.txt
- Remote management capabilities enabled
- Automatic database indexing
Testing Mode (Commented)
- Foreground execution for debugging
- Assertions enabled (-ea instead of -da)
- Index building disabled (index=f)
- Simplified configuration
Output and Logging
Log Files
- ntlogVM_32.txt: Combined stdout and stderr from the server
- Server startup messages and initialization progress
- Query processing statistics
- Error messages and warnings
- Memory usage and garbage collection information
Process Management
- Background process continues after terminal disconnection
- Process ID should be recorded for manual termination if needed
- Logs provide audit trail of server activity