TadWrapper

Script: tadwrapper.sh Package: assemble Class: TadpoleWrapper.java

Generates multiple assemblies with Tadpole to estimate the optimal kmer length through iterative assembly and quality comparison using L50, L90, maximum contig length, and total contig count metrics.

Basic Usage

tadwrapper.sh in=reads.fq out=contigs%.fa k=31,62,93

The output filename must contain a % symbol which will be replaced with the kmer length for each assembly. For example, "contigs%.fa" becomes "contigs31.fa", "contigs62.fa", "contigs93.fa".

Parameters

TadWrapper accepts specific wrapper parameters for controlling the assembly optimization process, while all other parameters are passed directly to Tadpole for assembly.

Wrapper Parameters

out=<file>
Output file name template. Must contain a % symbol which will be replaced with the kmer length for each assembly. Example: "contigs%.fa" generates "contigs31.fa", "contigs62.fa", etc.
outfinal=<file>
Optional. If specified, the best assembly file will be renamed to this filename after optimization is complete. The best assembly is determined by comparing L50, L90, maximum contig length, and total contig count metrics.
k=31
Comma-delimited list of kmer lengths to test. Example: k=31,62,93 will generate assemblies with kmers 31, 62, and 93. If not specified, defaults to k=31. Each kmer value is normalized using Kmer.getKbig() to ensure valid kmer lengths.
delete=f
Delete intermediate assemblies before terminating, keeping only the best assembly. Set to true to conserve disk space after optimization completes. Default: false.
quitearly=f
Quit optimization once assembly metrics stop improving with longer kmers. This can significantly reduce computation time when the optimal kmer has been found. Default: false.
bisect=f
Enable bisection search mode. Recursively assemble with kmer values midway between the two best kmers using (left+middle+1)/2 and (middle+right+1)/2 calculations until no further improvement is found. Default: false.
expand=f
Enable expansion search mode. Recursively test kmers shorter (0.7x current) or longer (1.25x current, +40 max) than the current best until improvement halts using expandLeft() and expandRight() methods. Used in conjunction with bisect mode. Default: false.

Examples

Basic Kmer Optimization

tadwrapper.sh in=reads.fq out=assembly_k%.fa k=31,51,71,91

Tests four different kmer lengths (31, 51, 71, 91) and produces assemblies: assembly_k31.fa, assembly_k51.fa, assembly_k71.fa, assembly_k91.fa. Reports the optimal kmer length based on assembly quality metrics.

Optimization with Final Output

tadwrapper.sh in=reads.fq out=temp_k%.fa outfinal=best_assembly.fa k=25,35,45,55,65 delete=t

Tests five kmer lengths, renames the best assembly to "best_assembly.fa", and deletes all intermediate assemblies to save disk space.

Advanced Optimization with Bisection

tadwrapper.sh in=reads.fq out=contigs%.fa k=31,63,95 bisect=t expand=t quitearly=t

Starts with three initial kmer values, then uses bisection to find intermediate optimal values. Expands the search range if the optimal kmer is at the boundaries. Stops early if metrics don't improve.

Passing Tadpole Parameters

tadwrapper.sh in=reads.fq out=assembly%.fa k=31,51,71 mincov=2 mincontig=200

All parameters except the wrapper-specific ones (out, outfinal, k, delete, quitearly, bisect, expand) are passed directly to Tadpole for each assembly.

Algorithm Details

Assembly Quality Comparison

TadWrapper uses a multi-metric comparison system implemented in the Record.compareTo() method to determine the optimal kmer length. The comparison algorithm prioritizes metrics in the following order:

Optimization Modes

Basic Mode

Tests each specified kmer length in order, comparing assembly quality using the multi-metric system. Each assembly is generated by calling Tadpole with the current kmer length and all user-provided parameters.

Early Termination (quitearly=t)

Stops testing additional kmer lengths once assembly quality metrics stop improving. This optimization can save significant computation time when the quality plateau has been reached.

Bisection Mode (bisect=t)

After testing initial kmer values, recursively tests kmer lengths midway between adjacent values to find finer-grained optima:

Expansion Mode (expand=t)

Tests kmer lengths outside the initial range if the optimal kmer is at the boundaries:

Memory Management

TadWrapper includes explicit garbage collection (System.gc()) between assemblies to manage memory usage during iterative assembly generation. The default memory allocation is 14GB (-Xmx14g -Xms14g), which can be overridden using standard Java memory flags.

Output File Handling

The wrapper handles output file management using File.renameTo() and ReformatReads fallback:

Support

For questions and support: