startTaxServerVM
Virtual machine startup script for launching the BBTools taxonomy server with pre-configured settings optimized for JGI infrastructure deployment.
Purpose
This script is designed to start the BBTools taxonomy server in a virtual machine environment, specifically for deployment on jgi-web-1. It provides a production-ready configuration with:
- Optimized memory allocation (31GB) for VM constraints
- Background execution using nohup for persistent service
- Comprehensive logging to taxlogVM_55.txt
- Pre-configured domain and security settings for JGI infrastructure
- Auto-detection of taxonomy data files
- Kill code management for server lifecycle control
Prerequisites
System Requirements
- Target System: jgi-web-1 or compatible VM environment
- Memory: Minimum 32GB RAM (script allocates 31GB to Java)
- Java: Java 8 or later with large heap support
- Network: Port 3068 must be available and accessible
- File System: Access to /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/
Required Files
- taxserver.sh: Main taxonomy server script at /global/projectb/sandbox/gaag/bbtools/jgi-bbtools/
- Taxonomy Data: Auto-detected files including:
- NCBI taxonomy tree (tree=auto)
- GI number table (table=auto)
- Accession files (accession=auto)
- Genome size data (size=auto)
- IMG integration data (img=auto)
- Pattern files (pattern=auto)
Configuration
The script uses hardcoded configuration values optimized for the JGI production environment:
Server Configuration
- LOG=taxlogVM_55.txt
- Log file for all server output, including startup messages, query logs, and error information.
- PASS=xxxxx
- Security password for server management operations (masked in source). Used for both kill code and old instance cleanup.
- DOMAIN=https://taxonomy.jgi.doe.gov
- Base domain URL displayed in server help messages and used for client redirects.
- KILL=https://taxonomy.jgi.doe.gov/kill/
- Endpoint URL for remotely terminating previous server instances during startup.
- PORT=3068
- HTTP port number for the taxonomy server. Standard port for JGI taxonomy services.
Java Configuration
- -da
- Disables Java assertions for improved production performance.
- -Xmx31g
- Sets maximum Java heap size to 31GB, optimized for VM memory constraints while leaving system memory for OS operations.
TaxServer Parameters
- port=3068
- HTTP server port number.
- verbose
- Enable detailed logging of server operations and query processing.
- accession=auto
- Automatically detect and load accession-to-taxonomy mapping files from default JGI locations.
- tree=auto
- Automatically detect and load NCBI taxonomy tree from default JGI location.
- table=auto
- Automatically detect and load GI-to-taxonomy table from default JGI location.
- size=auto
- Automatically detect and load genome size information from default JGI location.
- img=auto
- Automatically detect and load IMG database integration files from default JGI location.
- pattern=auto
- Automatically detect and load pattern files for efficient accession storage.
- prealloc
- Enable preallocation of data structures for faster server initialization and reduced memory fragmentation.
- domain=https://taxonomy.jgi.doe.gov
- Domain name displayed in server help messages and API responses.
- killcode=xxxxx
- Password for secure remote server termination via /kill/ endpoint.
- oldcode=xxxxx
- Password for terminating previous server instances during startup.
- oldaddress=https://taxonomy.jgi.doe.gov/kill/
- URL endpoint for sending termination commands to previous server instances.
- html
- Enable HTML formatting in server responses for web browser compatibility.
Usage
Production Deployment
# On jgi-web-1 system:
./startTaxServerVM.sh
Starts the taxonomy server in background with production configuration. The server will:
- Launch in background using nohup for persistence
- Load all taxonomy data automatically from JGI standard locations
- Listen on port 3068 for HTTP requests
- Log all activity to taxlogVM_55.txt
- Attempt to terminate any existing server instances
Testing Mode
The script includes a commented testing configuration:
# For testing purposes only (commented out in production):
/global/projectb/sandbox/gaag/bbtools/jgi-bbtools/taxserver.sh -ea -Xmx8g port=3068 verbose accession=null tree=auto table=null
This testing mode uses reduced memory (8GB) and minimal data loading for development and debugging.
Process Management
Background Execution
The script uses nohup
to ensure the server continues running even after the terminal session ends:
- Process isolation: Server runs independently of terminal
- Output redirection: Both stdout and stderr captured in log file
- Background operation: Terminal returns immediately after launch
- Persistence: Server survives terminal disconnection and system reboots (if configured)
Server Lifecycle
- Cleanup: Attempts to terminate previous server instances using oldcode/oldaddress
- Initialization: Loads taxonomy data files (may take several minutes)
- Service: Begins accepting HTTP requests on configured port
- Logging: All operations logged continuously to taxlogVM_55.txt
- Termination: Can be stopped remotely using /kill/ endpoint with proper credentials
Monitoring and Troubleshooting
Log Monitoring
# Monitor server startup and activity:
tail -f taxlogVM_55.txt
# Check for errors during initialization:
grep -i error taxlogVM_55.txt
# Monitor server performance:
grep -i "memory\|heap\|gc" taxlogVM_55.txt
Process Status
# Check if server is running:
ps aux | grep TaxServer
# Check port availability:
netstat -ln | grep 3068
# Test server response:
curl http://localhost:3068/help
Common Issues
- Memory Issues: Insufficient RAM will cause Java OutOfMemoryError
- Port Conflicts: Another service using port 3068 will prevent startup
- File Access: Missing or inaccessible taxonomy data files will cause initialization failure
- Permissions: Inadequate file system permissions for log file creation
- Java Version: Incompatible Java version may cause runtime issues
Security Considerations
Access Control
- Internal Network: Designed for deployment within JGI internal network
- Kill Code Protection: Server termination requires knowledge of password
- Domain Restriction: Configured for specific JGI domain endpoints
- File System Access: Limited to predefined data directories
Production Hardening
- Change default passwords before deployment
- Configure firewall rules for port 3068
- Set up log rotation for taxlogVM_55.txt
- Monitor resource usage and set appropriate limits
- Implement backup procedures for taxonomy data
Related Tools
- taxserver.sh: Main taxonomy server script that this wrapper calls
- comparesketch.sh: Client tool for sketch-based sequence comparison
- sendsketch.sh: Client tool for sending sketches to taxonomy server
- gi2taxid.sh: Tool for GI number to taxonomy ID conversion
Algorithm Details
Startup Process
The script implements a robust server startup procedure:
- Environment Check: Validates system requirements and file paths
- Previous Instance Cleanup: Sends kill command to any existing server using oldcode/oldaddress
- Data Loading: Automatically detects and loads all required taxonomy data files:
- NCBI taxonomy tree structure
- GI number to taxonomy ID mappings
- Accession number to taxonomy ID mappings
- Genome size information
- IMG database integration files
- Compressed pattern files for efficiency
- Memory Optimization: Preallocates data structures to minimize memory fragmentation
- Service Activation: Binds to port 3068 and begins accepting HTTP requests
- Background Execution: Detaches from terminal using nohup for persistent operation
Performance Characteristics
- Memory Usage: 31GB heap allocation optimized for VM environment
- Startup Time: 5-15 minutes depending on data size and storage performance
- Data Loading: Sequential loading of taxonomy files with progress logging
- Concurrent Handling: Multi-threaded HTTP server supporting simultaneous requests
- Resource Monitoring: Built-in memory and performance tracking
Fault Tolerance
- Graceful Degradation: Server continues with available data even if some files are missing
- Error Logging: Comprehensive error reporting to log file
- Recovery Mechanisms: Automatic cleanup of previous instances
- Resource Limits: Java heap limits prevent system memory exhaustion
Notes
- This script is specifically designed for JGI infrastructure deployment
- Hardcoded paths and configurations may need adaptation for other environments
- The script requires significant system resources (31GB RAM minimum)
- Startup time can be substantial due to large taxonomy data loading
- Production deployment should include appropriate monitoring and alerting
- Log file rotation should be configured to prevent disk space issues
- Server provides both taxonomy lookup and sketch comparison services
- API endpoints follow RESTful conventions for integration with web applications
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org
- Related Guides: BBTools/docs/guides/TaxonomyGuide.txt and BBSketchGuide.txt
- Main Tool: taxserver.sh documentation