TadWrapper
Generates multiple assemblies with Tadpole to estimate the optimal kmer length through iterative assembly and quality comparison using L50, L90, maximum contig length, and total contig count metrics.
Basic Usage
tadwrapper.sh in=reads.fq out=contigs%.fa k=31,62,93
The output filename must contain a % symbol which will be replaced with the kmer length for each assembly. For example, "contigs%.fa" becomes "contigs31.fa", "contigs62.fa", "contigs93.fa".
Parameters
TadWrapper accepts specific wrapper parameters for controlling the assembly optimization process, while all other parameters are passed directly to Tadpole for assembly.
Wrapper Parameters
- out=<file>
- Output file name template. Must contain a % symbol which will be replaced with the kmer length for each assembly. Example: "contigs%.fa" generates "contigs31.fa", "contigs62.fa", etc.
- outfinal=<file>
- Optional. If specified, the best assembly file will be renamed to this filename after optimization is complete. The best assembly is determined by comparing L50, L90, maximum contig length, and total contig count metrics.
- k=31
- Comma-delimited list of kmer lengths to test. Example: k=31,62,93 will generate assemblies with kmers 31, 62, and 93. If not specified, defaults to k=31. Each kmer value is normalized using Kmer.getKbig() to ensure valid kmer lengths.
- delete=f
- Delete intermediate assemblies before terminating, keeping only the best assembly. Set to true to conserve disk space after optimization completes. Default: false.
- quitearly=f
- Quit optimization once assembly metrics stop improving with longer kmers. This can significantly reduce computation time when the optimal kmer has been found. Default: false.
- bisect=f
- Enable bisection search mode. Recursively assemble with kmer values midway between the two best kmers using (left+middle+1)/2 and (middle+right+1)/2 calculations until no further improvement is found. Default: false.
- expand=f
- Enable expansion search mode. Recursively test kmers shorter (0.7x current) or longer (1.25x current, +40 max) than the current best until improvement halts using expandLeft() and expandRight() methods. Used in conjunction with bisect mode. Default: false.
Examples
Basic Kmer Optimization
tadwrapper.sh in=reads.fq out=assembly_k%.fa k=31,51,71,91
Tests four different kmer lengths (31, 51, 71, 91) and produces assemblies: assembly_k31.fa, assembly_k51.fa, assembly_k71.fa, assembly_k91.fa. Reports the optimal kmer length based on assembly quality metrics.
Optimization with Final Output
tadwrapper.sh in=reads.fq out=temp_k%.fa outfinal=best_assembly.fa k=25,35,45,55,65 delete=t
Tests five kmer lengths, renames the best assembly to "best_assembly.fa", and deletes all intermediate assemblies to save disk space.
Advanced Optimization with Bisection
tadwrapper.sh in=reads.fq out=contigs%.fa k=31,63,95 bisect=t expand=t quitearly=t
Starts with three initial kmer values, then uses bisection to find intermediate optimal values. Expands the search range if the optimal kmer is at the boundaries. Stops early if metrics don't improve.
Passing Tadpole Parameters
tadwrapper.sh in=reads.fq out=assembly%.fa k=31,51,71 mincov=2 mincontig=200
All parameters except the wrapper-specific ones (out, outfinal, k, delete, quitearly, bisect, expand) are passed directly to Tadpole for each assembly.
Algorithm Details
Assembly Quality Comparison
TadWrapper uses a multi-metric comparison system implemented in the Record.compareTo() method to determine the optimal kmer length. The comparison algorithm prioritizes metrics in the following order:
- L50 Comparison: Primary metric with 1% tolerance (L50 must differ by >1% to be considered better)
- L90 Comparison: Secondary metric with 1% tolerance for tie-breaking
- Maximum Contig Length: Tertiary metric with 1% tolerance
- Total Contig Count: Fewer contigs preferred (indicates better assembly connectivity)
- Fine-grained Comparison: If still tied, uses 0.2% tolerance for L50, L90, and maximum contig length
- Final Tie-breaker: Prefers shorter kmer lengths for computational efficiency
Optimization Modes
Basic Mode
Tests each specified kmer length in order, comparing assembly quality using the multi-metric system. Each assembly is generated by calling Tadpole with the current kmer length and all user-provided parameters.
Early Termination (quitearly=t)
Stops testing additional kmer lengths once assembly quality metrics stop improving. This optimization can save significant computation time when the quality plateau has been reached.
Bisection Mode (bisect=t)
After testing initial kmer values, recursively tests kmer lengths midway between adjacent values to find finer-grained optima:
- Calculates midpoint kmers: k1 = (left + middle + 1) / 2, k2 = (middle + right + 1) / 2
- Uses Kmer.getKbig() to ensure valid odd kmer lengths
- Continues bisection recursively until no improvement is found via Record.compareTo() evaluation
- Prevents infinite loops by checking for duplicate kmer values (k1==left.k || k1==mid.k)
Expansion Mode (expand=t)
Tests kmer lengths outside the initial range if the optimal kmer is at the boundaries:
- Left Expansion: Tests progressively shorter kmers (0.7x multiplier) if the optimal kmer is the shortest tested
- Right Expansion: Tests progressively longer kmers (min(current+40, current*1.25)) if the optimal kmer is the longest tested
- Continues expansion until assembly quality stops improving
- Used in combination with bisection mode when expand=true and bisect=true are both enabled
Memory Management
TadWrapper includes explicit garbage collection (System.gc()) between assemblies to manage memory usage during iterative assembly generation. The default memory allocation is 14GB (-Xmx14g -Xms14g), which can be overridden using standard Java memory flags.
Output File Handling
The wrapper handles output file management using File.renameTo() and ReformatReads fallback:
- Validates that output template contains % symbol
- Automatically renames best assembly to outfinal if specified
- Uses ReformatReads for cross-platform file copying when direct rename fails
- Optionally deletes intermediate assemblies to conserve disk space
- Provides clear reporting of the recommended optimal kmer length
Support
For questions and support:
- Email: bbushnell@lbl.gov
- Documentation: bbmap.org