MakeChimeras

Script: makechimeras.sh Package: synth Class: MakeChimeras.java

Makes chimeric sequences from nonchimeric sequences. Designed for PacBio reads.

Basic Usage

makechimeras.sh in=<input> out=<output> chimeras=<integer>

Creates artificial chimeric sequences by randomly fusing together pieces from input reads. This tool is particularly designed for generating synthetic PacBio reads with chimeric characteristics for testing and validation purposes.

Parameters

Parameters are organized by their function in the chimera creation process.

Input Parameters

in=<file>
The input file containing nonchimeric reads. Can be fasta or fastq format, compressed or uncompressed.
unpigz=t
Decompress with pigz for faster decompression. Uses parallel gzip decompression when available.

Output Parameters

out=<file>
Fasta output destination. The output file will contain the generated chimeric sequences.
chimeras=-1
Number of chimeras to create (required parameter). Must be set to a positive integer to specify how many synthetic chimeric sequences to generate.
forcelength=0
If a positive number X, one parent will be length X, and the other will be length-X. This forces specific length distributions in the chimeric products. When set to 0, random length selection is used.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Basic Chimera Generation

makechimeras.sh in=input_reads.fasta out=chimeric_reads.fasta chimeras=1000

Creates 1000 synthetic chimeric sequences from the input reads by randomly selecting and fusing pieces from different source sequences.

Forced Length Distribution

makechimeras.sh in=pacbio_reads.fasta out=test_chimeras.fasta chimeras=500 forcelength=5000

Generates 500 chimeric sequences where each chimera is constructed with one piece of exactly 5000bp and another piece that makes up the remaining length.

Processing Compressed Input

makechimeras.sh in=reads.fasta.gz out=chimeras.fasta chimeras=2000 unpigz=t

Processes compressed input using parallel gzip decompression for better performance on large datasets.

Algorithm Details

Chimera Construction Process

MakeChimeras uses Random.nextInt() selection with Shared.threadLocalRandom() for creating synthetic chimeric sequences:

Random Read Selection Strategy

Fragment Extraction Methods

The tool uses different strategies for extracting fragments from parent reads:

Sequence Assembly and Post-Processing

Performance Characteristics

Quality Control Features

Technical Notes

Input Requirements

Memory Considerations

Output Characteristics

Support

For questions and support: