Microbiome Profiling

Streamline Microbiome Profiling with Cosmos-Hub

Cosmos-Hub allows researchers to import their raw sequencing data directly into the platform and run a number of available bioinformatics pipelines for microbiome analysis.

Microbiome Profiling

Streamline Microbiome Profiling with Cosmos-Hub

Cosmos-Hub allows researchers to import their raw sequencing data directly into the platform and run a number of available bioinformatics pipelines for microbiome analysis.

In just a few easy steps, users can run industry-leading bioinformatics pipelines for a wide range of different data types:

Import data from their computer, from Illumina BaseSpace, directly from NCBI SRA or via Command Line Import (CLI).
Select the type of data they’d like to import: Shotgun, 16S or ITS
Select from 1 of 9 different host genomes to run automated host read depletion
Upload your study metadata
Choose your pipeline and primers

In just a few easy steps, users can run industry-leading bioinformatics pipelines for a wide range of different data types:

Import data from their computer, from Illumina BaseSpace, directly from NCBI SRA or via Command Line Import (CLI).
Select the type of data they’d like to import: Shotgun, 16S or ITS
Select from 1 of 9 different host genomes to run automated host read depletion
Upload your study metadata
Choose your pipeline and primers

Introduction to Shotgun Metagenomic Sequencing

Shotgun metagenomics approaches have enabled microbiome researchers to sequence the entire genetic content of microbial communities including bacteria, viruses, fungi, and protists. This is achieved through whole genome shotgun sequencing platforms that directly analyze genomic DNA to produce reads spanning the complete metagenome without amplification bias. These methods are especially valuable for characterizing microbial taxa from complex environments requiring reference-genome-level taxonomic insights combined with functional profiling.

Best-In-Class F1 Score and Abundance Estimation

Extensive Genome Database Curation & QC - Optimized for Intra-Species Diversity

Multi-Kingdom Profiler - Bacteria, Viruses, Protists & Fungi (30,000+ Species)

Nearest-Neighbor Placement at Strain-Level

12-19X Faster Profiling Speed

Sample Type & Host Agnostic Profiling

Patented in USPTO and EPO

610+ Citations

The Kepler algorithm within the Cosmos-Hub platform supports comprehensive analysis from sequencing data from any sequencing platforms (Illumina, Element, ThermoFisher, BGI, etc) through sophisticated workflows, providing both taxonomic profiling, functional annotation, antimicrobial resistance profiling, and virulence factor detection in a single, unified platform. By analyzing the entire genomic content, researchers can simultaneously characterize community structure and functional potential, including metabolic pathways, antimicrobial resistance genes, and virulence factors.

Comprehensive Ecosystem Profiling: Detect all microbial kingdoms, functional genes, and AMR/Virulence Factors in one workflow for complete community characterization.
Functional Insights: Beyond taxonomy, analyze metabolic pathways, antimicrobial resistance, and virulence factors.
Eliminates Amplification Bias: More accurate quantification of microbial relative abundance without PCR bias.

Novel Organism Detection: Capable of detecting low-abundance organisms and novel species missed by targeted approaches.
Multi-Kingdom Resolution: Simultaneous identification of bacteria, viruses, fungi, and protists with functional characterization

Cosmos-Hub Kepler Pipeline Overview

The Cosmos-Hub platform integrates three powerful profiling pipelines powered by our patented Kepler algorithm technology, delivering unparalleled accuracy and comprehensiveness in metagenomic analysis . This integrated approach combines taxonomic profiling and resistance using patented technology (US Patent No. US10108778B2, US20200294628A1, ES2899879T3) in combination with publicly available tools and databases to deliver superior performance across diverse sample types.

This versatility makes it the ideal solution for researchers working across multiple domains of microbiome science and is appreciated among the scientific community for its superior performance, as shown in extensive benchmarks.

Kepler Taxonomic Profiling Pipeline

Kepler is a patented multi-kingdom taxonomic profiler with three interwoven stages:

Pre-computational Database Curation: High-quality microbial genomes (>30,000 species across bacteria, viruses, fungi, protists) are cleaned and split into variable-length n-mers, organized into a phylogenetic tree structure with shared biomarkers as the backbone and unique biomarkers as leaves.
K-mer Classification: Sample reads are split into k-mer sets and matched against the database, eliminating 99% of unlikely genomes through biomarker aggregation and coverage depth estimation to generate a shortlist of candidate strains.
Probabilistic Refinement: A Smith-Waterman algorithm compares reads against the remaining 1% of candidates, using Maximum Likelihood Estimation to probabilistically assign contested reads and achieve precise abundance estimates with reduced variance.

Functional Profiling Pipeline

The Functional Workflow leverages Enzyme Commission, MetaCyc Pathways, Pfam CAZy and GO Terms databases to characterize the functional potential of microbiome communities. Quality controlled reads undergo translated search against the comprehensive UniRef 90 protein sequence database. Gene families are mapped and weighted by mapping quality, coverage and gene sequence length to estimate community-wide weighted gene family abundances.

Key Features:

Metabolic Pathway Reconstruction: Quantification of metabolic pathways (MetaCyc) in the community using established methodologies.
Multi-Database Annotation: UniRef_90 gene families regrouped to Enzyme Commission Enzymes, Pfam protein domains, CAZy enzymes and GO Terms.
Normalized Abundance: Total-sum scaling (TSS) normalization produces “Copies per million” units for cross-sample comparisons.

AMR/VF Profiling Pipeline

The Kepler-AMR/VF Profiling pipeline utilizes advanced k-mer-based algorithms and hierarchical data structures to deliver accurate antimicrobial resistance and virulence factor insights. The pipeline leverages curated nucleotide gene sequences from ResFinder and VFDB databases, organized into hierarchical tree-like structures with shared and unique biomarker attributes .

Kepler Multi-Kingdom Profiler

Step 1: Metagenomic Biomarker Database Curation

Step 2: K-mer Based Taxonomic Classification/Identification

Step 3: Probabilistic Smith-Waterman based Abundance estimation

Taxonomic Outputs

Kepler employs a host-agnostic curated database (GenBank) containing over 30,000 species and 150,000+ representative genes and genomes across multiple kingdoms:

Bacteria: Comprehensive strain-level resolution with GTDB reference links for detailed community analysis
Viruses: Bacteriophages, eukaryotic viruses, and viral sequences for ecosystem dynamics understanding
Fungi: Detection of yeasts, molds, and environmental fungi essential for food safety studies
Protists: Eukaryotes including parasites and environmental protists for complete ecosystem characterization

Functional Outputs

The functional pipeline provides comprehensive characterization across multiple annotation databases:

MetaCyc Pathways: Reconstructed metabolic pathways with quantitative abundance estimates
Enzyme Commission: Complete enzyme classification and functional potential assessment
CAZy Enzymes: Carbohydrate-active enzyme profiling for metabolic pathway analysis
GO Terms: Gene ontology annotation for comprehensive functional categorization
Pfam Domains: Protein domain identification for detailed functional insights

AMR/VF Outputs

The resistance and virulence profiling delivers clinically relevant insights:

Antimicrobial Resistance: Detection and annotation of resistance determinants across different antibiotic classes
Virulence Factors: Comprehensive pathogenicity assessment through virulence gene detection from VFDB database
Stratified Reporting: Individual gene-level results and antimicrobial resistance class stratification

Pipeline Performance, Benchmarking & References

Kepler™ has been extensively benchmarked against leading metagenomic profilers including Kraken2/Bracken and MetaPhlAn4 using standardized mock communities and WHO International Reference Reagents. Comprehensive validation across five community standards (ATCC MSA-1003, MSA-2006, MSA-1005, Zymo D6300, D6311) demonstrates Kepler’s superior F1-scores, with consistently higher precision and recall than competing tools.

Key Performance Highlights:

Superior F1-Scores: Kepler achieves F1-scores ranging from 82-100% across all tested community standards
Consistent Excellence: Outperforms Kraken2/Bracken and MetaPhlAn4 in mock community tests

Balanced Performance: Optimal combination of precision and recall, particularly excelling in complex community detection
Validated Methodology: Results published in comprehensive microbiome studies with reproducible datasets

The benchmarking demonstrates Kepler’s exceptional capability for accurate taxonomic classification while maintaining low false positive rates across diverse microbial community compositions. More information on the method and comprehensive performance metrics can be found in the Cosmos-Hub documentation and our detailed benchmarking whitepaper.

Figure: Comparison of F1-Scores for five microbiome community standards across three bioinformatics pipelines. Bar plot shows F1-Scores for Kepler, Kraken2/Bracken, and MetaPhlAn4 on five synthetic community standards: (A) ATCC MSA-1003, (B) ATCC MSA-2006, (C) ATCC MSA-1005, (D) Zymo D6300, and (E) Zymo D6311. Kepler consistently achieved the highest F1-Scores across all standards, indicating superior accuracy in classifying both prevalent and rare bacterial taxa. Lower scores for Kraken2/Bracken and MetaPhlAn4 were primarily due to missed or misclassified taxa, particularly those at low abundance.

Download Cosmos-HUB Kepler Whitepaper

Direct Fast Upload

Simply drag and drop your fastq's or use the command line to upload your samples directly to the cloud platform in a matter of minutes.

Time to Results

Generate comprehensive taxonomic and functional results in a matter of hours once files are successfully uploaded.

Multi-Kingdom Integration

Simultaneous detection of bacteria, viruses, fungi, protists, and functional elements in a single comprehensive analysis, all with GTDB based nomenclature and hyperlinked references.

Integrated with the Cosmos-Hub Statistics Toolbox

Once profiling is complete, leverage your metadata to create groups and generate statistical analyses and interactive visualizations.

Patented Technology

Industry-leading Kepler algorithm with proven superior performance and host-agnostic design for any sample type.

Finally, the Cosmos-Hub Support team is on hand to provide personalized parameter recommendations, data interpretation guidance, and technical support across all time zones to ensure your research success.

Book a demo