SendClade

Script: sendclade.sh Package: clade Class: SendClade.java

Sends taxonomic queries to a remote QuickClade server for classification. This client-server architecture allows users to classify sequences without loading the reference database locally, dramatically reducing memory requirements and improving performance for multiple queries. The client sends sequence data to a remote server running CladeServer with a preloaded reference database, receives taxonomic classifications, and displays the results.

Key Advantages

The client-server design offers several key advantages:

SendClade mirrors the SendSketch architecture and provides the same taxonomic classification capabilities as QuickClade but with reduced local resource requirements. It is particularly useful in compute environments where memory is limited or when processing many samples sequentially.

Basic Usage

sendclade.sh in=sequences.fasta
sendclade.sh in=sequences.fasta address=http://myserver.com:3069
sendclade.sh in=sequences.fasta hits=10 oneline out=results.tsv
sendclade.sh in=sequences.fasta local=t mode=perseq minlen=1000
sendclade.sh in=bin1.fa,bin2.fa,bin3.fa hits=5 heap=10

Parameters

File Parameters

in=<file,file>
Query files or directories. Input can be fasta, fastq, .clade, or .spectra format. Pre-computed .clade/.spectra files are sent directly without sequence processing. Multiple files can be specified comma-separated, or loose file names are permitted as additional arguments.
out=stdout
Output file for results. If not specified, results are written to standard output. Progress messages always go to stderr.
local=f
Use local server at localhost:5002 instead of the default remote server. Useful for testing or when running your own CladeServer.
address=<url>
Specify custom server address. Should include full URL with protocol and port, e.g., http://myserver.com:3069/clade. If protocol is omitted, http:// is assumed. Default: https://bbmapservers.jgi.doe.gov/quickclade

Basic Parameters

hits=1
Number of top taxonomic hits to return per query. More hits provide alternative classifications but increase output size.
oneline=f
Print results in tab-delimited format with one line per query. Default format is human-readable with detailed information. Oneline format includes: QueryName, Q_GC, Q_Bases, Q_Contigs, RefName, R_TaxID, R_GC, R_Bases, R_Contigs, R_Level, GCdif, STRdif, k3dif, k4dif, k5dif, lineage.
percontig=f
Process each contig/sequence separately instead of combining all sequences from each file into a single query. When true, each contig gets its own taxonomic classification. When false, all sequences in a file are combined for classification.
minlen=0
Minimum contig length in percontig mode. Contigs shorter than this threshold are ignored. Only applies when percontig=true.

Advanced Parameters

heap=1
Number of intermediate comparison results to store during processing. Higher values may improve accuracy for complex queries at the cost of increased memory usage on the server.
printqtid=f
Print query TaxID if present in sequence headers. Useful for benchmarking when query sequences have known taxonomic labels in the format 'tid_1234' or similar.
banself=f
Ban self-matches by ignoring records with the same TaxID as the query. Makes the program behave as if that organism is not in the reference database. Useful for testing accuracy.
verbose=f
Enable detailed progress reporting and timing information. Shows batch processing, server communication details, and performance metrics.

Standard BBTools Parameters

overwrite=f
Allow overwriting of existing output files.
append=f
Append to existing output files instead of overwriting.

Java Parameters

-Xmx
Set Java's memory usage. Default: 2g (fixed allocation for client operations). SendClade uses minimal memory as no reference database is loaded locally.
-eoom
Exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Server Communication

The default server is: https://bbmapservers.jgi.doe.gov/quickclade

Sequences are sent in batches of up to 4000 clades for efficient processing. The server responds with taxonomic classifications in either human-readable or tab-delimited format depending on the oneline parameter.

Batch Processing

SendClade automatically batches queries to optimize network communication and server processing:

Examples

Basic Taxonomic Classification

sendclade.sh in=sequences.fasta

Classify sequences using the default remote server with human-readable output to stdout.

Machine-Readable Output with Multiple Hits

sendclade.sh in=sequences.fasta hits=10 oneline out=results.tsv

Get top 10 taxonomic matches per query in tab-delimited format suitable for downstream analysis.

Per-Contig Analysis with Length Filter

sendclade.sh in=sequences.fasta local=t mode=perseq minlen=1000

Classify each contig separately using a local server, ignoring sequences shorter than 1000bp.

Multiple Files with Custom Server

sendclade.sh in=bin1.fa,bin2.fa,bin3.fa address=http://myserver.com:3069

Process multiple input files using a custom CladeServer instance.

Pre-computed Clade Files

sendclade.sh in=precomputed.clade hits=5 heap=10

Send pre-computed .clade or .spectra files directly to server without sequence processing overhead.

Benchmarking with Verbose Output

sendclade.sh in=labeled.fasta printqtid=t banself=t verbose=t out=benchmark.txt

Evaluate classification accuracy with labeled queries, excluding self-matches, and detailed timing information.

Output Format

Human-Readable Format (Default)

The default output provides detailed information for each query:

#Query1
query_name 1:    0.475    40    1    Cortinarius geophilus var. subauroreus    2764306    0.416    609    1    1    0.059    0.167    0.579    1.000    1.000    sk__Eukaryota;k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Cortinariaceae;g__Cortinarius;s__Cortinarius geophilus

Machine-Readable Format (oneline=t)

Tab-delimited format with header and one line per query:

#QueryName    Q_GC    Q_Bases    Q_Contigs    RefName    R_TaxID    R_GC    R_Bases    R_Contigs    R_Level    GCdif    STRdif    k3dif    k4dif    k5dif    lineage
query_name    0.475    40    1    Cortinarius geophilus var. subauroreus    2764306    0.416    609    1    1    0.059    0.167    0.579    1.000    1.000    sk__Eukaryota;k__Fungi;...

Column Descriptions

Q_GC
Query GC content (fraction)
Q_Bases
Query sequence length in bases
Q_Contigs
Number of contigs in query
RefName
Reference organism name
R_TaxID
Reference NCBI taxonomic ID
R_GC
Reference GC content
R_Bases
Reference sequence length
R_Level
Taxonomic level of assignment
GCdif
Absolute GC content difference
STRdif
Strandedness difference
k3dif, k4dif, k5dif
K-mer frequency differences for 3-mer, 4-mer, and 5-mer comparisons. Lower k5dif values indicate higher confidence classifications.
lineage
Full taxonomic lineage from superkingdom to species

Performance Notes

SendClade is designed for high-throughput processing with minimal client-side resource requirements:

Support

Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

For documentation and the latest version, visit: https://bbmap.org