VCF2GFF

Script: vcf2gff.sh Package: gff Class: GffLine.java

Generates a GFF3 from a VCF.

Basic Usage

vcf2gff.sh in=<vcf file> out=<gff file>

VCF2GFF converts Variant Call Format (VCF) files to General Feature Format version 3 (GFF3) files. Each variant call in the VCF becomes a sequence_variant_obs feature in the GFF3 output.

Parameters

VCF2GFF has a minimal parameter set focused on file input/output specification.

Parameters

in=<file>
Input VCF file. Standard VCF format with variant calls to be converted to GFF3 features. Can be compressed (gzip).
out=<file>
Output GFF file. Will be written in GFF3 format with sequence_variant_obs features representing each variant.

Examples

Basic VCF to GFF3 Conversion

vcf2gff.sh in=variants.vcf out=variants.gff3

Converts a VCF file containing variant calls to GFF3 format.

Converting Compressed VCF

vcf2gff.sh in=variants.vcf.gz out=variants.gff3

Processes a gzip-compressed VCF file and outputs uncompressed GFF3.

Pipeline Usage

# Call variants and convert to GFF3
callvariants.sh in=reads.sam ref=reference.fa out=variants.vcf
vcf2gff.sh in=variants.vcf out=variants.gff3

Example pipeline showing variant calling followed by GFF3 conversion for annotation workflows.

Algorithm Details

VCF2GFF implements direct format conversion using the GffLine(VCFLine vcf) constructor, which transforms VCF variant records into standardized GFF3 sequence_variant_obs features through byte-level parsing and coordinate transformation.

Conversion Architecture

Variant Type Classification

Variant types are determined using vcf.type() and encoded via Var.typeArray[vtype] constants with ByteBuilder string construction:

Quality Score Processing

VCF QUAL values are directly cast to float via score=(float)vcf.qual preserving variant call confidence scores in GFF3 format without transformation or normalization.

Coordinate System Implementation

Coordinate mapping utilizes VCFLine accessor methods with precise boundary calculations:

Memory Management

Line-by-line processing using ByteFile input streams with ByteBuilder string manipulation (16-byte initial capacity) minimizes memory footprint. The 200MB heap allocation (-Xmx200m) accommodates VCFLine object instantiation and temporary string construction without requiring full file buffering.

File Format Details

Input VCF Requirements

Output GFF3 Format

The output follows GFF3 specification with these characteristics:

Use Cases

Support

For questions and support: