AdjustHomopolymers

Script: adjusthomopolymers.sh Package: jgi Class: AdjustHomopolymers.java

Shrinks or expands homopolymers in DNA sequences. This tool modifies homopolymer runs (consecutive identical bases) by either expanding or contracting them based on a specified rate parameter.

Basic Usage

adjusthomopolymers.sh in=<input file> out=<output file> rate=<float>

Input may be fasta or fastq, compressed or uncompressed.

Parameters

Parameters are organized into functional groups matching the shell script organization. All parameters from the shell script are documented below.

Standard parameters

in=<file>
Primary input, or read 1 input.
in2=<file>
Read 2 input if reads are in two files.
out=<file>
Primary output, or read 1 output.
out2=<file>
Read 2 output if reads are in two files.
overwrite=f
(ow) Set to false to force the program to abort rather than overwrite an existing file.
ziplevel=2
(zl) Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.

Processing parameters

rate=0.0
Controls homopolymer adjustment. Positive values expand homopolymers (rate=0.1 expands by 10%), negative values shrink them (rate=-0.1 shrinks by 10%). Default is 0.0 (no change).

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Examples

Expanding homopolymers by 20%

adjusthomopolymers.sh in=reads.fq out=expanded.fq rate=0.2

Expands all homopolymer runs by 20%. A run of 5 A's would become 6 A's (5 + 5*0.2 = 6).

Shrinking homopolymers by 15%

adjusthomopolymers.sh in=reads.fq out=shrunk.fq rate=-0.15

Shrinks all homopolymer runs by 15%. A run of 10 T's would become 9 T's (10 + 10*(-0.15) = 8.5, truncated to 8).

Processing paired reads

adjusthomopolymers.sh in1=reads1.fq in2=reads2.fq out1=adj1.fq out2=adj2.fq rate=0.1

Processes paired-end reads, expanding homopolymers by 10% in both files.

Algorithm Details

Homopolymer Detection and Adjustment:

The algorithm processes each read base-by-base using a streak counter approach:

Implementation Details:

Use Cases:

Support

For questions and support: