SendClade
Sends taxonomic queries to a remote QuickClade server for classification. This client-server architecture allows users to classify sequences without loading the reference database locally, dramatically reducing memory requirements and improving performance for multiple queries. The client sends sequence data to a remote server running CladeServer with a preloaded reference database, receives taxonomic classifications, and displays the results.
Key Advantages
The client-server design offers several key advantages:
- No local database loading - Saves gigabytes of memory
- Faster startup time - No database initialization required
- Consistent results - All users query the same reference database
- Centralized maintenance - Database updates managed on server side
- Ideal for batch processing - Efficient for processing many samples sequentially
SendClade mirrors the SendSketch architecture and provides the same taxonomic classification capabilities as QuickClade but with reduced local resource requirements. It is particularly useful in compute environments where memory is limited or when processing many samples sequentially.
Basic Usage
sendclade.sh in=sequences.fasta
sendclade.sh in=sequences.fasta address=http://myserver.com:3069
sendclade.sh in=sequences.fasta hits=10 oneline out=results.tsv
sendclade.sh in=sequences.fasta local=t mode=perseq minlen=1000
sendclade.sh in=bin1.fa,bin2.fa,bin3.fa hits=5 heap=10
Parameters
File Parameters
- in=<file,file>
- Query files or directories. Input can be fasta, fastq, .clade, or .spectra format. Pre-computed .clade/.spectra files are sent directly without sequence processing. Multiple files can be specified comma-separated, or loose file names are permitted as additional arguments.
- out=stdout
- Output file for results. If not specified, results are written to standard output. Progress messages always go to stderr.
- local=f
- Use local server at localhost:5002 instead of the default remote server. Useful for testing or when running your own CladeServer.
- address=<url>
- Specify custom server address. Should include full URL with protocol and port, e.g., http://myserver.com:3069/clade. If protocol is omitted, http:// is assumed. Default: https://bbmapservers.jgi.doe.gov/quickclade
Basic Parameters
- hits=1
- Number of top taxonomic hits to return per query. More hits provide alternative classifications but increase output size.
- oneline=f
- Print results in tab-delimited format with one line per query. Default format is human-readable with detailed information. Oneline format includes: QueryName, Q_GC, Q_Bases, Q_Contigs, RefName, R_TaxID, R_GC, R_Bases, R_Contigs, R_Level, GCdif, STRdif, k3dif, k4dif, k5dif, lineage.
- percontig=f
- Process each contig/sequence separately instead of combining all sequences from each file into a single query. When true, each contig gets its own taxonomic classification. When false, all sequences in a file are combined for classification.
- minlen=0
- Minimum contig length in percontig mode. Contigs shorter than this threshold are ignored. Only applies when percontig=true.
Advanced Parameters
- heap=1
- Number of intermediate comparison results to store during processing. Higher values may improve accuracy for complex queries at the cost of increased memory usage on the server.
- printqtid=f
- Print query TaxID if present in sequence headers. Useful for benchmarking when query sequences have known taxonomic labels in the format 'tid_1234' or similar.
- banself=f
- Ban self-matches by ignoring records with the same TaxID as the query. Makes the program behave as if that organism is not in the reference database. Useful for testing accuracy.
- verbose=f
- Enable detailed progress reporting and timing information. Shows batch processing, server communication details, and performance metrics.
Standard BBTools Parameters
- overwrite=f
- Allow overwriting of existing output files.
- append=f
- Append to existing output files instead of overwriting.
Java Parameters
- -Xmx
- Set Java's memory usage. Default: 2g (fixed allocation for client operations). SendClade uses minimal memory as no reference database is loaded locally.
- -eoom
- Exit if an out-of-memory exception occurs. Requires Java 8u92+.
- -da
- Disable assertions.
Server Communication
The default server is: https://bbmapservers.jgi.doe.gov/quickclade
Sequences are sent in batches of up to 4000 clades for efficient processing. The server responds with taxonomic classifications in either human-readable or tab-delimited format depending on the oneline parameter.
Batch Processing
SendClade automatically batches queries to optimize network communication and server processing:
- Automatic batching - Groups up to 4000 clades per request
- Efficient encoding - Compresses clade data for network transmission
- Error handling - Validates HTTP response codes and provides detailed error messages
- Progress tracking - Reports timing information when verbose=true
Examples
Basic Taxonomic Classification
sendclade.sh in=sequences.fasta
Classify sequences using the default remote server with human-readable output to stdout.
Machine-Readable Output with Multiple Hits
sendclade.sh in=sequences.fasta hits=10 oneline out=results.tsv
Get top 10 taxonomic matches per query in tab-delimited format suitable for downstream analysis.
Per-Contig Analysis with Length Filter
sendclade.sh in=sequences.fasta local=t mode=perseq minlen=1000
Classify each contig separately using a local server, ignoring sequences shorter than 1000bp.
Multiple Files with Custom Server
sendclade.sh in=bin1.fa,bin2.fa,bin3.fa address=http://myserver.com:3069
Process multiple input files using a custom CladeServer instance.
Pre-computed Clade Files
sendclade.sh in=precomputed.clade hits=5 heap=10
Send pre-computed .clade or .spectra files directly to server without sequence processing overhead.
Benchmarking with Verbose Output
sendclade.sh in=labeled.fasta printqtid=t banself=t verbose=t out=benchmark.txt
Evaluate classification accuracy with labeled queries, excluding self-matches, and detailed timing information.
Output Format
Human-Readable Format (Default)
The default output provides detailed information for each query:
#Query1
query_name 1: 0.475 40 1 Cortinarius geophilus var. subauroreus 2764306 0.416 609 1 1 0.059 0.167 0.579 1.000 1.000 sk__Eukaryota;k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Cortinariaceae;g__Cortinarius;s__Cortinarius geophilus
Machine-Readable Format (oneline=t)
Tab-delimited format with header and one line per query:
#QueryName Q_GC Q_Bases Q_Contigs RefName R_TaxID R_GC R_Bases R_Contigs R_Level GCdif STRdif k3dif k4dif k5dif lineage
query_name 0.475 40 1 Cortinarius geophilus var. subauroreus 2764306 0.416 609 1 1 0.059 0.167 0.579 1.000 1.000 sk__Eukaryota;k__Fungi;...
Column Descriptions
- Q_GC
- Query GC content (fraction)
- Q_Bases
- Query sequence length in bases
- Q_Contigs
- Number of contigs in query
- RefName
- Reference organism name
- R_TaxID
- Reference NCBI taxonomic ID
- R_GC
- Reference GC content
- R_Bases
- Reference sequence length
- R_Level
- Taxonomic level of assignment
- GCdif
- Absolute GC content difference
- STRdif
- Strandedness difference
- k3dif, k4dif, k5dif
- K-mer frequency differences for 3-mer, 4-mer, and 5-mer comparisons. Lower k5dif values indicate higher confidence classifications.
- lineage
- Full taxonomic lineage from superkingdom to species
Performance Notes
SendClade is designed for high-throughput processing with minimal client-side resource requirements:
- Memory usage - Client uses only 2GB fixed allocation regardless of database size
- Network efficiency - Automatic batching optimizes bandwidth usage
- Server-side optimization - Preloaded databases and optimized comparison algorithms on server
- Verbose mode - Use verbose=true for detailed timing information during batch processing
- Pre-computed files - .clade and .spectra formats skip sequence processing for maximum speed
Support
Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.
For documentation and the latest version, visit: https://bbmap.org