Lilypad

Script: lilypad.sh Package: consensus Class: Lilypad.java

Uses mapped paired reads to generate scaffolds from contigs. Designed for use with ordinary paired-end Illumina libraries.

Basic Usage

lilypad.sh in=mapped.sam ref=contigs.fa out=scaffolds.fa

Lilypad takes mapped paired-end reads and reference contigs to generate scaffolded assemblies by analyzing insert size distributions and read pair orientations.

Parameters

Parameters are organized by their function in the scaffolding process.

Standard Parameters

in=<file>
Reads mapped to the reference; should be sam or bam format. Required input parameter.
ref=<file>
Reference contigs; may be fasta or fastq format. Required reference parameter.
out=<file>
Modified reference output; should be fasta format. Generated scaffolds will be written here.
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file. Default: false.

Processing Parameters

gap=10
Pad gaps with a minimum of this many Ns. Sets the minimum number of N bases inserted between joined contigs. Default: 10.
mindepth=4
Minimum spanning read pairs to join contigs. Higher values require more evidence for joining but reduce misassemblies. Default: 4.
maxinsert=3000
Maximum allowed insert size for proper pairs. Read pairs with insert sizes above this threshold are filtered out. Default: 3000.
mincontig=200
Ignore contigs under this length if there is a longer alternative. Helps prioritize longer, more reliable contigs during scaffolding. Default: 200.
minwr=0.8
(minWeightRatio) Minimum fraction of outgoing edges pointing to the same contig. Lower values will increase continuity at a risk of misassemblies. Range: 0.0-1.0. Default: 0.8.
minsr=0.8
(minStrandRatio) Minimum fraction of outgoing edges indicating the same orientation. Lower values will increase continuity at a possible risk of inversions. Range: 0.0-1.0. Default: 0.8.
passes=8
Number of scaffolding passes to perform. More passes may increase continuity by allowing iterative improvement of scaffold connections. Default: 8.
samestrandpairs=f
Read pairs map to the same strand. Set to true for libraries where both reads in a pair have the same orientation. Currently untested. Default: false.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions. May provide minor performance improvement in production runs.

Examples

Basic Scaffolding

lilypad.sh in=mapped_pairs.sam ref=contigs.fasta out=scaffolds.fasta

Scaffolds contigs using paired-end reads mapped to the reference with default parameters.

Conservative Scaffolding

lilypad.sh in=mapped_pairs.sam ref=contigs.fasta out=scaffolds.fasta mindepth=8 minwr=0.9 minsr=0.9

Uses more stringent parameters to reduce misassemblies: requires more read pair evidence (mindepth=8) and higher consensus for edge directions (minwr=0.9, minsr=0.9).

Large Insert Libraries

lilypad.sh in=mate_pairs.sam ref=contigs.fasta out=scaffolds.fasta maxinsert=8000 gap=50

Configured for mate pair libraries with larger insert sizes. Increases maximum insert size to 8kb and gap padding to 50 Ns.

High Memory Usage

lilypad.sh -Xmx32g in=large_dataset.sam ref=assembly.fasta out=scaffolds.fasta

Allocates 32GB of memory for processing large datasets with many contigs and read pairs.

Algorithm Details

Lilypad implements a LinkedHashMap-based scaffolding algorithm that analyzes paired-end read mappings to determine contig connectivity and orientation using Edge and Contig classes with weight-based connection validation.

Core Algorithm Components

Graph Construction

The algorithm builds a scaffold graph where:

Insert Size Analysis

Lilypad performs insert size analysis using 1000-bucket histograms:

Edge Quality Assessment

Each potential scaffold connection is evaluated using multiple criteria:

Scaffold Path Finding

The scaffolding process uses findLeftmost() and expandRight() methods with bestEdge() selection:

Thread Safety and Performance

Lilypad uses AtomicIntegerArray, AtomicLongArray, and ReadWriteLock for scalability:

Quality Control Features

Multiple validation layers ensure scaffold quality:

Memory Management

Memory usage patterns:

Technical Notes

Input Requirements

Performance Considerations

Common Issues

Output Format

Support

For questions and support: