Start Silva Server VM
Server startup script for the Silva ribosomal RNA identification service. This script launches a taxonomic server instance on JGI infrastructure (jgi-web-4) that provides ribosomal RNA sequence identification services using Silva database sketches.
Overview
This script starts a taxonomic server for Silva ribosomal RNA identification using BBTools' taxserver.sh. The server provides a web-based interface for identifying ribosomal RNA sequences through k-mer sketching against the Silva database. It's specifically configured to run on JGI's jgi-web-4 server infrastructure.
Server Configuration
Service Settings
- LOG
- Log file: ribologVM_32.txt - Contains server output and error messages
- PASS
- Authentication password: xxxxx (placeholder - real password configured in production)
- DOMAIN
- Service domain: https://ribo-sketch.jgi.doe.gov - Public endpoint for the service
- KILL
- Kill URL: https://ribo-sketch.jgi.doe.gov/kill/ - Administrative endpoint for server termination
- PORT
- Service port: 3073 - Internal port for the taxonomic server
- REF
- Reference sketches: /global/projectb/sandbox/gaag/bbtools/silva/latest/both_seq#.sketch - Silva database sketches
- DB
- Database name: silva - Identifier for the Silva ribosomal RNA database
Server Parameters
Taxserver Configuration
- -da
- JVM assertion flag - Disables assertions for production performance
- -Xmx10g
- Java heap memory: 10GB - Maximum memory allocation for the server process
- port=$PORT
- Server port: Uses PORT variable (3073) for incoming connections
- verbose
- Verbose logging: Enables detailed output for debugging and monitoring
- tree=auto
- Taxonomic tree: Automatic tree loading and construction
- sketchonly
- Sketch-only mode: Uses only k-mer sketches for identification (faster)
- index
- Index loading: Loads reference sketches into memory for fast queries
- whitelist
- Access control: Enables IP whitelisting for security
- domain=$DOMAIN
- Web domain: Sets the public domain for web interface access
- killcode=$PASS
- Kill authentication: Password required for server termination
- oldcode=$PASS
- Legacy authentication: Backward compatibility authentication code
- oldaddress=$KILL
- Legacy kill URL: Backward compatibility termination endpoint
- ref=$REF
- Reference database: Path to Silva sketch files for sequence matching
- dbname=Silva
- Database display name: Human-readable name shown in web interface
- blacklist=silva
- Blacklist configuration: Silva-specific filtering rules
- k=32,24
- K-mer sizes: Uses both 32-mer and 24-mer sketches for identification accuracy
Prerequisites
System Requirements
- Server: JGI jgi-web-4 infrastructure
- Java: Java 8 or later with 10GB+ heap space
- BBTools: Complete BBTools installation with taxserver.sh
- Memory: Minimum 12GB RAM (10GB for Java heap + system overhead)
- Storage: Access to Silva sketch database files
Database Requirements
- Silva Sketches: Pre-computed k-mer sketches of Silva ribosomal RNA sequences
- Path Access: Read access to /global/projectb/sandbox/gaag/bbtools/silva/latest/
- File Pattern: Sketch files matching both_seq*.sketch pattern
- Database Version: Latest Silva release with taxonomic annotations
Network Requirements
- Port Access: Port 3073 must be available for binding
- Domain Registration: DNS entry for ribo-sketch.jgi.doe.gov
- SSL Certificate: Valid certificate for HTTPS service
- Firewall Rules: Appropriate access controls for public service
Usage
Starting the Server
Run the script directly on the designated server:
./startSilvaServerVM.sh
The server will start in the background using nohup and log output to ribologVM_32.txt.
Testing Mode
For development and testing, the script includes a simplified command (commented out):
# Simple mode for testing
/global/projectb/sandbox/gaag/bbtools/jgi-bbtools/taxserver.sh -ea -Xmx10g \
port=3073 verbose tree=auto sketchonly silva k=32,24 index=f \
domain=https://ribo-sketch.jgi-psf.org
This testing mode uses:
- -ea: Enables assertions for debugging
- index=f: Disables index loading for faster startup
- Alternative domain: jgi-psf.org instead of jgi.doe.gov
- No authentication: Simplified configuration for testing
Service Management
Monitoring
- Log File: Monitor ribologVM_32.txt for server status and errors
- Process Status: Use ps aux | grep taxserver to check running status
- Port Status: Use netstat -tlnp | grep 3073 to verify port binding
- Web Interface: Access https://ribo-sketch.jgi.doe.gov for service availability
Stopping the Server
- Administrative Stop: Use the kill URL with authentication password
- Process Kill: Find and kill the Java process running taxserver.sh
- Graceful Shutdown: Use standard SIGTERM signal for clean shutdown
Troubleshooting
- Port Conflicts: Ensure port 3073 is not in use by other services
- Memory Issues: Monitor system memory usage; increase -Xmx if needed
- Database Access: Verify read permissions for Silva sketch files
- Network Connectivity: Test domain resolution and SSL certificate validity
- Authentication: Verify password configuration for kill functionality
Security Considerations
Access Control
- Whitelist: Server implements IP-based access control
- Authentication: Kill functionality requires password authentication
- HTTPS: All communication encrypted via SSL/TLS
- Process Isolation: Server runs as dedicated service user
Data Protection
- Read-Only Database: Silva sketches accessed in read-only mode
- No User Data Storage: Query sequences processed in memory only
- Log Rotation: Regular rotation of log files to prevent disk filling
- Resource Limits: JVM heap size limits prevent memory exhaustion
Service Architecture
Component Overview
The Silva server provides ribosomal RNA identification through the following components:
- Web Interface: HTTPS endpoint for sequence submission and results
- Sketch Engine: K-mer sketching for fast sequence comparison
- Taxonomic Database: Silva ribosomal RNA sequences with taxonomic annotations
- Query Processing: Real-time sequence analysis and classification
- Result Formatting: Structured output with taxonomic assignments
Performance Characteristics
- Query Speed: Sub-second response for typical rRNA sequences
- Throughput: Multiple concurrent queries supported
- Memory Usage: 10GB heap for sketch indexing and query processing
- Accuracy: Dual k-mer strategy (32,24) for robust identification
- Database Coverage: Comprehensive Silva ribosomal RNA collection
Integration
API Access
The server provides programmatic access through HTTP/HTTPS requests to the configured domain. Queries can be submitted as FASTA sequences for taxonomic identification against the Silva database.
Related Tools
- taxserver.sh: Core taxonomic server implementation
- sketch.sh: Manual sketching tool for sequence analysis
- sendsketch.sh: Client tool for submitting queries to taxonomic servers
- comparesketch.sh: Local sketch comparison and analysis
Database Updates
Silva database sketches should be updated periodically to reflect new Silva releases. The server should be restarted after sketch updates to load the latest data.
Notes
- This script is specifically designed for JGI's production infrastructure
- Hardcoded paths and configurations require modification for other environments
- The service provides public access to Silva-based ribosomal RNA identification
- Regular monitoring of log files and system resources is recommended
- Authentication passwords should be properly secured in production deployments