PlotGC

Script: plotgc.sh Package: driver Class: PlotGC.java

Prints sequence gc content once per interval.

Basic Usage

plotgc.sh in=<input file> out=<output file>

PlotGC analyzes sequence GC content by dividing sequences into fixed-size intervals and calculating the GC percentage for each interval. The output is a tab-delimited file containing position and GC content information.

Parameters

Parameters control input/output files, interval settings, and position calculations.

Input/Output Parameters

in=<file>
Input file. Accepts FASTA or FASTQ format (compressed or uncompressed). Use in=stdin.fa to pipe from stdin.
out=<file>
Output file for tab-delimited GC content data. Use out=stdout to pipe to stdout. Default: stdout.txt

Analysis Parameters

interval=1000
Interval length in base pairs. The sequence is divided into intervals of this size for GC calculation.
offset=0
Position offset for coordinates. Use offset=1 for 1-based indexing instead of the default 0-based indexing.
psb=t
(printshortbins) Print GC content for the last bin of a contig even when shorter than the specified interval length. Set to false to skip incomplete final intervals.

Java Parameters

-Xmx
Set Java's memory usage, overriding automatic memory detection. Examples: -Xmx20g for 20 gigabytes, -Xmx200m for 200 megabytes. The maximum is typically 85% of physical memory.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+.
-da
Disable assertions.

Output Format

The output is a tab-delimited file with the following columns:

Examples

Basic GC Content Analysis

plotgc.sh in=genome.fasta out=gc_content.txt

Calculate GC content in 1000bp intervals for a genome assembly.

Custom Interval Size

plotgc.sh in=sequences.fq out=gc_plot.txt interval=500

Use 500bp intervals instead of the default 1000bp for higher resolution analysis.

1-based Coordinates

plotgc.sh in=contigs.fa out=gc_analysis.txt offset=1 psb=f

Use 1-based coordinate system and skip incomplete final intervals.

Pipeline Integration

cat sequences.fasta | plotgc.sh in=stdin.fa out=stdout | head -20

Process sequences from stdin and display the first 20 lines of GC data.

Algorithm Details

PlotGC implements a straightforward sliding window approach for GC content calculation:

GC Calculation Method

Interval Processing

Memory Efficiency

Output Precision

GC content is formatted to 3 decimal places (e.g., 0.423 for 42.3% GC content). Position coordinates are adjusted by the offset parameter, allowing for both 0-based (default) and 1-based coordinate systems.

Performance Considerations

Common Use Cases

Support

For questions and support: