EstherFilter

Script: estherfilter.sh Package: driver Class: EstherFilter.java

BLASTs queries against reference, and filters out hits with scores less than 'cutoff'. The score is taken from column 12 of the BLAST output. The specific BLAST command is: blastall -p blastn -i QUERY -d REFERENCE -e 0.00001 -m 8

Basic Usage

estherfilter.sh <query> <reference> <cutoff>

This tool requires exactly three positional arguments:

Parameters

This tool uses positional arguments with one optional parameter:

Positional Arguments

<query>
Input FASTA file containing query sequences to be BLASTed
<reference>
Reference database file (must be BLAST-formatted)
<cutoff>
Minimum BLAST score threshold. Only hits with scores >= cutoff will be retained

Optional Parameters

fasta
Fourth argument. When set to "fasta", outputs results in FASTA format instead of just sequence names. Requires more memory as it loads the entire query file.

Examples

Basic Filtering

estherfilter.sh reads.fasta genes.fasta 1000 > results.txt

BLASTs reads.fasta against genes.fasta and outputs only query sequence names that have BLAST hits with scores ≥ 1000

FASTA Output

estherfilter.sh reads.fasta genes.fasta 1000 fasta > filtered_sequences.fasta

Same filtering as above, but outputs the actual FASTA sequences instead of just names. Uses more memory to load and process the query file.

Lower Stringency Filtering

estherfilter.sh contigs.fasta reference_genome.fasta 500 > matching_contigs.txt

Filters contigs against a reference genome using a lower score threshold of 500

Algorithm Details

EstherFilter is a BLAST-based sequence filtering tool that combines external BLAST execution with score-based filtering through ReadWrite.getInputStreamFromProcess(). The implementation operates in distinct phases with specific data structures and methods:

BLAST Execution Phase

The tool executes BLAST commands through ReadWrite.getInputStreamFromProcess("foo", command, false, false, true):

blastall -p blastn -i [query] -d [reference] -e 0.00001 -m 8

Score-Based Filtering Implementation

Two distinct processing methods handle BLAST output parsing:

processToNames() Method

processToFasta() Method

FASTA Output Implementation

The outputFasta() method implements a two-stage sequence extraction process:

Stage 1: Name Sorting

Stage 2: Sequence Extraction

Performance Characteristics

BBTools Integration Points

EstherFilter leverages specific BBTools infrastructure components:

Support

For questions and support: