Phylip2Fasta

Script: phylip2fasta.sh Package: jgi Class: PhylipToFasta.java

Transforms interleaved phylip to fasta format. This tool reads phylip files with interleaved sequence data and converts them to standard FASTA format for broader compatibility with bioinformatics tools.

Basic Usage

phylip2fasta.sh in=<input> out=<output>

Input may be stdin or an interleaved phylip file, compressed or uncompressed. The input phylip file is the only required parameter.

Parameters

Parameters are organized by their function in the conversion process. All parameters from the shell script usage function are documented below.

Input Parameters

in=<file>
The input phylip file; this is the only required parameter. Can accept stdin, compressed or uncompressed phylip files. The file should be in interleaved phylip format with sequence names followed by sequence data in blocks.
unpigz=true
Decompress with pigz for faster decompression. Uses parallel gzip decompression when processing compressed input files. Default: true

Output Parameters

out=<file>
Fasta output destination. Specifies where to write the converted FASTA sequences. If not specified, output goes to stdout. The output will be in standard FASTA format with sequence headers starting with '>' followed by the sequence name from the phylip file.

Java Parameters

-Xmx
This will set Java's memory usage, overriding autodetection. -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically 85% of physical memory. Default memory allocation is 1GB for this tool.
-eoom
This flag will cause the process to exit if an out-of-memory exception occurs. Requires Java 8u92+. Useful for automated pipelines where graceful failure is preferred over hanging processes.
-da
Disable assertions. Can provide minor performance improvements in production environments by skipping assertion checks in the Java code.

Examples

Basic Conversion

phylip2fasta.sh in=alignment.phy out=alignment.fasta

Converts an interleaved phylip file to FASTA format.

Processing Compressed Input

phylip2fasta.sh in=alignment.phy.gz out=alignment.fasta

Converts a gzip-compressed phylip file to FASTA format using parallel decompression.

Using Standard Input/Output

cat alignment.phy | phylip2fasta.sh

Reads phylip data from standard input and writes FASTA output to standard output.

With Custom Memory Settings

phylip2fasta.sh -Xmx4g in=large_alignment.phy out=large_alignment.fasta

Processes a large phylip file with 4GB of memory allocated to Java.

Algorithm Details

The phylip2fasta conversion algorithm implements a two-phase parsing strategy optimized for interleaved phylip format:

Phase 1: Header and Initial Sequence Parsing

Phase 2: Interleaved Block Processing

Memory Management

File Format Support

Error Handling

Technical Notes

Phylip Format Requirements

Performance Characteristics

Output Format

Support

For questions and support: