BBTools Pipelines

Production scripts and complete workflows for bioinformatics analysis

📁 34 Pipeline Scripts 🔧 Production Ready 📋 Complete Workflows 📖 HTML Documentation

Showing all BBTools pipelines

🧬 Genome Assembly Pipelines

Assembly Pipeline

Complete workflow for preprocessing Illumina 2x150bp reads and genome assembly using multiple assemblers (Tadpole, SPAdes, Megahit). Includes quality control, error correction, and evaluation steps.

Assemble Mitochondria (Illumina)

Specialized pipeline for assembling mitochondrial genomes from Illumina data. Uses coverage-based filtering and targeted assembly approaches optimized for organellar DNA.

Assemble Mitochondria (PacBio)

Pipeline for mitochondrial genome assembly from error-corrected PacBio long reads. Handles the unique challenges of long-read organellar assembly.

Assemble PolyG Isolate

Assembly pipeline specifically designed for single isolate genomes with polyG tail issues common in certain sequencing platforms.

Assemble PolyG Metagenome

Metagenome assembly pipeline handling polyG artifacts and the complexities of mixed microbial communities.

🔍 Variant Calling & Analysis Pipelines

Variant Pipeline

Complete workflow for variant calling from Illumina reads, including quality recalibration, error correction, mapping, and SNP/indel detection with VCF output.

Call Insertions

Specialized pipeline for detecting and calling insertion variants from mapped sequencing data.

🦠 COVID-19 Analysis Pipelines

Core SARS-CoV-2 processing pipeline for variant calling and consensus genome generation from Illumina COVID data, supporting both shotgun and amplicon protocols.

Process Corona Wrapper

Batch processing wrapper for COVID-19 analysis, allowing processing of multiple libraries with quality score calibration and systematic variant calling.

Make COVID Summary

Generates comprehensive summary statistics and reports from COVID-19 sequencing analysis results.

COVID Recalibration

Quality score recalibration pipeline specifically optimized for SARS-CoV-2 sequencing data analysis.

✂️ CRISPR Analysis Pipeline

CRISPR Pipeline

Complete workflow for CRISPR detection and analysis in Illumina data, including preprocessing, merging, error correction, and identification of CRISPR arrays, repeats, and spacers.

📥 Data Fetching & Database Pipelines

Downloads and processes NCBI RefSeq complete genomes with taxonomic annotation and proper naming conventions for BBTools compatibility.

Fetch NT Database

Downloads and processes the NCBI nucleotide (NT) database for use with BBTools sketching and taxonomic identification.

Fetch NT Database (Outer)

Extended version of NT database fetching with additional outer sequences for comprehensive nucleotide database coverage.

Downloads and sets up NCBI taxonomy database files required for taxonomic classification in BBTools.

Downloads plasmid sequences from NCBI databases for contamination screening and mobile element analysis.

Downloads plastid/chloroplast genome sequences for plant genomics and organellar assembly projects.

Fetch Prokaryotes by Genus

Downloads prokaryotic genomes organized by taxonomic genus for comparative genomics and reference database creation.

Run RefSeq Protein

Downloads and processes RefSeq protein sequences for protein-level comparative analyses.

Creates MinHash sketches from RefSeq genomes for rapid taxonomic identification and similarity searches.

🌱 SILVA Database Processing

Downloads and processes SILVA ribosomal RNA database (SSU and LSU) with taxonomic annotation for microbial identification and phylogenetic analysis.

Make 15-mers (SILVA)

Generates 15-mer databases from SILVA sequences for rapid ribosomal RNA classification and identification.

Make Covering Set SSU

Creates a representative covering set of small subunit (16S/18S) ribosomal RNA sequences from SILVA database.

Make Covering Set LSU

Creates a representative covering set of large subunit (23S/28S) ribosomal RNA sequences from SILVA database.

Make Representative Set

Generates representative sequence sets from SILVA database for efficient ribosomal RNA analysis and classification.

🖥️ Server Management Scripts

Start NT Server VM

Launches nucleotide database server virtual machine for BBTools taxonomic services with sketch-based identification.

Start Protein Server VM

Launches protein database server virtual machine for amino acid sequence analysis and taxonomic classification.

Start RefSeq Server VM

Launches RefSeq database server virtual machine for genomic sequence analysis and taxonomic identification services.

Start SILVA Server VM

Launches SILVA ribosomal RNA database server for 16S/18S/23S/28S sequence identification and phylogenetic analysis.

Start Tax Server VM

Launches NCBI taxonomy server virtual machine providing taxonomic information services for BBTools applications.

🔬 Specialized Processing Pipelines

Pipeline for processing Integrated Microbial Genomes (IMG) data from JGI with specialized formatting and analysis workflows.

Process Lane v007

Version 7 of the production lane processing pipeline for high-throughput sequencing data quality control and preprocessing.

Specialized pipeline for RNA sequence processing, trimming, and preparation for downstream analysis.

Make Ribosomal Kmers

Generates kmer databases from ribosomal RNA sequences for rapid microbial identification and contamination detection.

🔍 Quality Control & Testing

Test Platform Quality

Quality assessment pipeline for different sequencing platforms, analyzing read quality metrics and platform-specific characteristics.

Testing and validation pipeline for MinHash sketching functionality and taxonomic identification accuracy.

📋 Pipeline Overview

These production pipelines represent battle-tested workflows used in high-throughput sequencing facilities. Each pipeline combines multiple BBTools in optimized sequences to solve specific bioinformatics challenges.

Documentation: Each pipeline link now opens comprehensive HTML documentation with parameter details, usage examples, and workflow explanations - no more parsing raw shell scripts!

Getting Started: Most pipelines expect input files named according to specific conventions (e.g., "reads.fq.gz"). The HTML documentation provides detailed usage instructions and requirements for each workflow.

Customization: These pipelines serve as templates that can be modified for your specific data types, file structures, and analysis requirements.