SynthMDA

Script: synthmda.sh Package: synth Class: SynthMDA.java

Generates synthetic reads by simulating Multiple Displacement Amplification (MDA) through iterative stochastic fragment selection and amplification, producing characteristic uneven coverage distributions found in single-cell genomics.

Basic Usage

synthmda.sh in=<reference> out=<reads out file>

Input may be fasta or fastq, compressed or uncompressed.

Parameters

Parameters control the MDA amplification simulation and read generation process.

Parameters

reads=12000000: Generate this many reads.
paired=t: Generate paired reads.
length=150: Reads should be this long.
minlen=4000: Min length of MDA contig.
maxlen=150000: Max length of MDA contig.
cycles=9: Number of MDA cycles; higher is more spiky. Each cycle calls amplify() method with the specified ratio parameter.
initialratio=1.3: Fraction of genome initially replicated in first amplification round; lower values create more spiky coverage patterns.
ratio=1.7: Fraction of genome replicated per cycle after initial round. Default differs from shell (1.7) vs Java code (2.0).
refout=null: Write MDA'd genome to this file. If null, generates temporary filename using ReadWrite.stripToCore() + random hex suffix.
perfect=0: Fraction of reads that will be error-free. Passed as 'perfect' parameter to RandomReads3.main().
amp=200: 'amp' flag sent to RandomReads3 for coverage distribution skewing (higher values create more uneven coverage).
build=7: Build index location for MDA'd genome. Parsed by Parser.build and passed to RandomReads3.
prefix=null: Generated reads will start with this prefix.
overwrite=t: (ow) Set to false to force the program to abort rather than overwrite an existing file.

Java Parameters

-Xmx: This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom: This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da: Disable assertions.

Examples

Basic MDA Simulation

synthmda.sh in=genome.fa out=mda_reads.fq reads=10000000

Generate 10 million synthetic reads from a reference genome with default MDA parameters.

High Coverage Spiky Distribution

synthmda.sh in=reference.fa out=reads.fq cycles=12 initialratio=1.1 amp=400

Create a more uneven coverage distribution with higher cycles and amp values.

Single-End Reads with Custom Length

synthmda.sh in=genome.fa out=reads.fq paired=f length=250 reads=5000000

Generate 5 million single-end reads of 250bp length.

Algorithm Details

SynthMDA simulates Multiple Displacement Amplification (MDA) through the amplify() method using ByteBuilder data structures and Shared.threadLocalRandom() for stochastic fragment generation.

Reference Processing

ConcurrentReadInputStream loads the reference genome through FileFormat.testInput()
Sequences are concatenated into a single ByteBuilder with '$' delimiters as sequence boundaries
The concatenated genome becomes the template for iterative amplification cycles

Amplification Algorithm

The amplify() method implements the core MDA simulation:

Initial amplification: amplify(bb, false, minlen, maxlen, initialRatio) creates the first amplification round
Cyclic amplification: For each of the specified cycles, amplify(dest, i<1, minlen, maxlen, ratio) is called
Fragment generation: randy.nextInt(slen) selects random start positions in the source ByteBuilder
Length determination: minlen+randy.nextInt(range) where range=maxlen-minlen+1
Orientation selection: Tools.nextBoolean(randy) determines forward vs reverse-complement strand
Boundary detection: Fragment extension stops when encountering '$' delimiters
Reverse complement: AminoAcid.baseToComplementExtended[b] performs base complementation for reverse fragments
Size control: Maximum amplified genome size limited to 1.5GB (1500000000 bytes)

Read Generation

After amplification cycles complete, synthetic reads are generated via RandomReads3.main():

Error model parameters: Fixed error rates configured in the ArrayList<String> parameter list
SNP rate: 0.02 (2%) via snprate parameter
Deletion rate: 0.005 (0.5%) via delrate parameter
Insertion rate: 0.005 (0.5%) via insrate parameter
N rate: 0.005 (0.5%) via nrate parameter
Quality scores: minq=16, midq=25, maxq=38 with Phred scaling
Error clustering: maxinss=2, maxdels=2, maxns=2, maxsnps=2 limits consecutive errors

Memory Management

ByteBuilder retention: retain=false for genomes ≥500MB to prevent memory overflow
Fragment length checking: Fragments shorter than minlen2 (minimum 4000bp) are rejected
Goal calculation: Target amplification size = min(slen*ratio, 600000000) to cap memory usage
Temporary file cleanup: Amplified reference deleted automatically when deleteRef=true

The resulting coverage distribution exhibits the characteristic uneven, spiky pattern of MDA-amplified single-cell genomes through the stochastic fragment selection and variable amplification ratios per cycle.

Support

For questions and support:

Email: bbushnell@lbl.gov
Documentation: bbmap.org