BBTools Pipelines

Production scripts and complete workflows for bioinformatics analysis

📁 34 Pipeline Scripts 🔧 Production Ready 📋 Complete Workflows 📖 HTML Documentation
Showing all BBTools pipelines

🧬 Genome Assembly Pipelines

Assembly Pipeline
Complete workflow for preprocessing Illumina 2x150bp reads and genome assembly using multiple assemblers (Tadpole, SPAdes, Megahit). Includes quality control, error correction, and evaluation steps.
Assemble Mitochondria (Illumina)
Specialized pipeline for assembling mitochondrial genomes from Illumina data. Uses coverage-based filtering and targeted assembly approaches optimized for organellar DNA.
Assemble Mitochondria (PacBio)
Pipeline for mitochondrial genome assembly from error-corrected PacBio long reads. Handles the unique challenges of long-read organellar assembly.
Assemble PolyG Isolate
Assembly pipeline specifically designed for single isolate genomes with polyG tail issues common in certain sequencing platforms.
Assemble PolyG Metagenome
Metagenome assembly pipeline handling polyG artifacts and the complexities of mixed microbial communities.

🔍 Variant Calling & Analysis Pipelines

Variant Pipeline
Complete workflow for variant calling from Illumina reads, including quality recalibration, error correction, mapping, and SNP/indel detection with VCF output.
Call Insertions
Specialized pipeline for detecting and calling insertion variants from mapped sequencing data.

🦠 COVID-19 Analysis Pipelines

Process Corona
Core SARS-CoV-2 processing pipeline for variant calling and consensus genome generation from Illumina COVID data, supporting both shotgun and amplicon protocols.
Process Corona Wrapper
Batch processing wrapper for COVID-19 analysis, allowing processing of multiple libraries with quality score calibration and systematic variant calling.
Make COVID Summary
Generates comprehensive summary statistics and reports from COVID-19 sequencing analysis results.
COVID Recalibration
Quality score recalibration pipeline specifically optimized for SARS-CoV-2 sequencing data analysis.

✂️ CRISPR Analysis Pipeline

CRISPR Pipeline
Complete workflow for CRISPR detection and analysis in Illumina data, including preprocessing, merging, error correction, and identification of CRISPR arrays, repeats, and spacers.

📥 Data Fetching & Database Pipelines

Fetch RefSeq
Downloads and processes NCBI RefSeq complete genomes with taxonomic annotation and proper naming conventions for BBTools compatibility.
Fetch NT Database
Downloads and processes the NCBI nucleotide (NT) database for use with BBTools sketching and taxonomic identification.
Fetch NT Database (Outer)
Extended version of NT database fetching with additional outer sequences for comprehensive nucleotide database coverage.
Fetch Taxonomy
Downloads and sets up NCBI taxonomy database files required for taxonomic classification in BBTools.
Fetch Plasmids
Downloads plasmid sequences from NCBI databases for contamination screening and mobile element analysis.
Fetch Plastids
Downloads plastid/chloroplast genome sequences for plant genomics and organellar assembly projects.
Fetch Prokaryotes by Genus
Downloads prokaryotic genomes organized by taxonomic genus for comparative genomics and reference database creation.
Run RefSeq Protein
Downloads and processes RefSeq protein sequences for protein-level comparative analyses.
Sketch RefSeq
Creates MinHash sketches from RefSeq genomes for rapid taxonomic identification and similarity searches.

🌱 SILVA Database Processing

Fetch SILVA
Downloads and processes SILVA ribosomal RNA database (SSU and LSU) with taxonomic annotation for microbial identification and phylogenetic analysis.
Make 15-mers (SILVA)
Generates 15-mer databases from SILVA sequences for rapid ribosomal RNA classification and identification.
Make Covering Set SSU
Creates a representative covering set of small subunit (16S/18S) ribosomal RNA sequences from SILVA database.
Make Covering Set LSU
Creates a representative covering set of large subunit (23S/28S) ribosomal RNA sequences from SILVA database.
Make Representative Set
Generates representative sequence sets from SILVA database for efficient ribosomal RNA analysis and classification.

🖥️ Server Management Scripts

Start NT Server VM
Launches nucleotide database server virtual machine for BBTools taxonomic services with sketch-based identification.
Start Protein Server VM
Launches protein database server virtual machine for amino acid sequence analysis and taxonomic classification.
Start RefSeq Server VM
Launches RefSeq database server virtual machine for genomic sequence analysis and taxonomic identification services.
Start SILVA Server VM
Launches SILVA ribosomal RNA database server for 16S/18S/23S/28S sequence identification and phylogenetic analysis.
Start Tax Server VM
Launches NCBI taxonomy server virtual machine providing taxonomic information services for BBTools applications.

🔬 Specialized Processing Pipelines

Process IMG
Pipeline for processing Integrated Microbial Genomes (IMG) data from JGI with specialized formatting and analysis workflows.
Process Lane v007
Version 7 of the production lane processing pipeline for high-throughput sequencing data quality control and preprocessing.
Cut RNA
Specialized pipeline for RNA sequence processing, trimming, and preparation for downstream analysis.
Make Ribosomal Kmers
Generates kmer databases from ribosomal RNA sequences for rapid microbial identification and contamination detection.

🔍 Quality Control & Testing

Test Platform Quality
Quality assessment pipeline for different sequencing platforms, analyzing read quality metrics and platform-specific characteristics.
Test Sketch
Testing and validation pipeline for MinHash sketching functionality and taxonomic identification accuracy.

📋 Pipeline Overview

These production pipelines represent battle-tested workflows used in high-throughput sequencing facilities. Each pipeline combines multiple BBTools in optimized sequences to solve specific bioinformatics challenges.

Documentation: Each pipeline link now opens comprehensive HTML documentation with parameter details, usage examples, and workflow explanations - no more parsing raw shell scripts!

Getting Started: Most pipelines expect input files named according to specific conventions (e.g., "reads.fq.gz"). The HTML documentation provides detailed usage instructions and requirements for each workflow.

Customization: These pipelines serve as templates that can be modified for your specific data types, file structures, and analysis requirements.