Microbiome Profiling

Streamline Microbiome Profiling with Cosmos-Hub

Cosmos-Hub allows researchers to import their raw sequencing data directly into the platform and run a number of available bioinformatics pipelines for microbiome analysis.

microbiome profiling page-1
Icon

Kepler Host-Agnostic Taxonomic Profiling

Kepler Multi-Kingdom Profiler

Microbiome Profiling

Streamline Microbiome Profiling with Cosmos-Hub

Cosmos-Hub allows researchers to import their raw sequencing data directly into the platform and run a number of available bioinformatics pipelines for microbiome analysis.

Microbiome  Profiling
 

In just a few easy steps, users can run industry-leading bioinformatics pipelines for a wide range of different data types:

  • Import data from their computer, from Illumina BaseSpace, directly from NCBI SRA or via Command Line Import (CLI).
  • Select the type of data they’d like to import: Shotgun, 16S or ITS
  • Select from 1 of 9 different host genomes to run automated host read depletion
  • Upload your study metadata
  • Choose your pipeline and primers
 

In just a few easy steps, users can run industry-leading bioinformatics pipelines for a wide range of different data types:

  • Import data from their computer, from Illumina BaseSpace, directly from NCBI SRA or via Command Line Import (CLI).
  • Select the type of data they’d like to import: Shotgun, 16S or ITS
  • Select from 1 of 9 different host genomes to run automated host read depletion
  • Upload your study metadata
  • Choose your pipeline and primers
Microbiome  Profiling
Kepler Host-Agnostic Taxonomic Profiling

Introduction to Shotgun Metagenomic Sequencing 

 

Shotgun metagenomics approaches have enabled microbiome researchers to sequence the entire genetic content of microbial communities including bacteria, viruses, fungi, and protists. This is achieved through whole genome shotgun sequencing platforms that directly analyze genomic DNA to produce reads spanning the complete metagenome without amplification bias. These methods are especially valuable for characterizing microbial taxa from complex environments requiring reference-genome-level taxonomic insights combined with functional profiling.

The Kepler algorithm within the Cosmos-Hub platform supports comprehensive analysis from sequencing data from any sequencing platforms (Illumina, Element, ThermoFisher, BGI, etc) through sophisticated workflows, providing both taxonomic profiling, functional annotation, antimicrobial resistance profiling, and virulence factor detection in a single, unified platform. By analyzing the entire genomic content, researchers can simultaneously characterize community structure and functional potential, including metabolic pathways, antimicrobial resistance genes, and virulence factors.

  • Comprehensive Ecosystem Profiling: Detect all microbial kingdoms, functional genes, and AMR/Virulence Factors in one workflow for complete community characterization.

  • Functional Insights: Beyond taxonomy, analyze metabolic pathways, antimicrobial resistance, and virulence factors.

  • Eliminates Amplification Bias: More accurate quantification of microbial relative abundance without PCR bias.

  • Novel Organism Detection: Capable of detecting low-abundance organisms and novel species missed by targeted approaches.

  • Multi-Kingdom Resolution: Simultaneous identification of bacteria, viruses, fungi, and protists with functional characterization

Cosmos-Hub Kepler Pipeline Overview

The Cosmos-Hub platform integrates three powerful profiling pipelines powered by our patented Kepler algorithm technology, delivering unparalleled accuracy and comprehensiveness in metagenomic analysis . This integrated approach combines taxonomic profiling and resistance  using patented technology (US Patent No. US10108778B2, US20200294628A1, ES2899879T3) in combination with publicly available tools and databases to deliver superior performance across diverse sample types. 

This versatility makes it the ideal solution for researchers working across multiple domains of microbiome science and is appreciated among the scientific community for its superior performance, as shown in extensive benchmarks.

Kepler Taxonomic Profiling Pipeline

Kepler is a patented multi-kingdom taxonomic profiler with three interwoven stages:

  1. Pre-computational Database Curation: High-quality microbial genomes (>30,000 species across bacteria, viruses, fungi, protists) are cleaned and split into variable-length n-mers, organized into a phylogenetic tree structure with shared biomarkers as the backbone and unique biomarkers as leaves.

  2. K-mer Classification: Sample reads are split into k-mer sets and matched against the database, eliminating 99% of unlikely genomes through biomarker aggregation and coverage depth estimation to generate a shortlist of candidate strains.

  3. Probabilistic Refinement: A Smith-Waterman algorithm compares reads against the remaining 1% of candidates, using Maximum Likelihood Estimation to probabilistically assign contested reads and achieve precise abundance estimates with reduced variance.

Functional Profiling Pipeline

The Functional Workflow leverages Enzyme Commission, MetaCyc Pathways, Pfam CAZy and GO Terms databases to characterize the functional potential of microbiome communities. Quality controlled reads undergo translated search against the comprehensive UniRef 90 protein sequence database. Gene families are mapped and weighted by mapping quality, coverage and gene sequence length to estimate community-wide weighted gene family abundances.

Key Features:

  • Metabolic Pathway Reconstruction: Quantification of metabolic pathways (MetaCyc) in the community using established methodologies. 

  • Multi-Database Annotation: UniRef_90 gene families regrouped to Enzyme Commission Enzymes, Pfam protein domains, CAZy enzymes and GO Terms. 

  • Normalized Abundance: Total-sum scaling (TSS) normalization produces “Copies per million” units for cross-sample comparisons.

AMR/VF Profiling Pipeline

The Kepler-AMR/VF Profiling pipeline utilizes advanced k-mer-based algorithms and hierarchical data structures to deliver accurate antimicrobial resistance and virulence factor insights. The pipeline leverages curated nucleotide gene sequences from ResFinder and VFDB databases, organized into hierarchical tree-like structures with shared and unique biomarker attributes .

Databases and Outputs

Taxonomic Outputs

Kepler employs a host-agnostic curated database (GenBank) containing over 30,000 species and 150,000+ representative genes and genomes across multiple kingdoms: 

  • Bacteria: Comprehensive strain-level resolution with GTDB reference links for detailed community analysis 

  • Viruses: Bacteriophages, eukaryotic viruses, and viral sequences for ecosystem dynamics understanding 

  • Fungi: Detection of yeasts, molds, and environmental fungi essential for food safety studies 

  • Protists: Eukaryotes including parasites and environmental protists for complete ecosystem characterization

Functional Outputs

The functional pipeline provides comprehensive characterization across multiple annotation databases:

  • MetaCyc Pathways: Reconstructed metabolic pathways with quantitative abundance estimates 

  • Enzyme Commission: Complete enzyme classification and functional potential assessment 

  • CAZy Enzymes: Carbohydrate-active enzyme profiling for metabolic pathway analysis 

  • GO Terms: Gene ontology annotation for comprehensive functional categorization 

  • Pfam Domains: Protein domain identification for detailed functional insights

AMR/VF Outputs

The resistance and virulence profiling delivers clinically relevant insights: 

  • Antimicrobial Resistance: Detection and annotation of resistance determinants across different antibiotic classes 

  • Virulence Factors: Comprehensive pathogenicity assessment through virulence gene detection from VFDB database 

  • Stratified Reporting: Individual gene-level results and antimicrobial resistance class stratification 

Pipeline Performance, Benchmarking & References

 

Kepler™ has been extensively benchmarked against leading metagenomic profilers including Kraken2/Bracken and MetaPhlAn4 using standardized mock communities and WHO International Reference Reagents. Comprehensive validation across five community standards (ATCC MSA-1003, MSA-2006, MSA-1005, Zymo D6300, D6311) demonstrates Kepler’s superior F1-scores, with consistently higher precision and recall than competing tools.

Key Performance Highlights:

  • Superior F1-Scores: Kepler achieves F1-scores ranging from 82-100% across all tested community standards

  • Consistent Excellence: Outperforms Kraken2/Bracken and MetaPhlAn4 in mock community tests
  • Balanced Performance: Optimal combination of precision and recall, particularly excelling in complex community detection

  • Validated Methodology: Results published in comprehensive microbiome studies with reproducible datasets

The benchmarking demonstrates Kepler’s exceptional capability for accurate taxonomic classification while maintaining low false positive rates across diverse microbial community compositions. More information on the method and comprehensive performance metrics can be found in the Cosmos-Hub documentation and our detailed benchmarking whitepaper.

kepler profiling (1)

Figure:  Comparison of F1-Scores for five microbiome community standards across three bioinformatics pipelines. Bar plot shows F1-Scores for Kepler, Kraken2/Bracken, and MetaPhlAn4 on five synthetic community standards: (A) ATCC MSA-1003, (B) ATCC MSA-2006, (C) ATCC MSA-1005, (D) Zymo D6300, and (E) Zymo D6311. Kepler consistently achieved the highest F1-Scores across all standards, indicating superior accuracy in classifying both prevalent and rare bacterial taxa. Lower scores for Kraken2/Bracken and MetaPhlAn4 were primarily due to missed or misclassified taxa, particularly those at low abundance.

Why Choose Kepler for Shotgun Metagenomics Analysis?
 
Extensive benchmarking and validation of the integrated Kepler pipeline streamlines microbiome analysis with unparalleled efficiency and accuracy:
 
 
Direct fast upload

Simply drag and drop your fastq's or use the command line to upload your samples directly to the cloud platform in a matter of minutes.

Multi-Kingdom Integration

Simultaneous detection of bacteria, viruses, fungi, protists, and functional elements in a single comprehensive analysis, all with GTDB based nomenclature and hyperlinked references.

Patented Technology

Industry-leading Kepler algorithm with proven superior performance and host-agnostic design for any sample type.

Time to results

Generate comprehensive taxonomic and functional results in a matter of hours once files are successfully uploaded.

Integrated with the Cosmos-Hub Statistics Toolbox

Once profiling is complete, leverage your metadata to create groups and generate statistical analyses and interactive visualization.

Finally, the Cosmos-Hub Support team is on hand to provide personalized parameter recommendations, data interpretation guidance, and technical support across all time zones to ensure your research success.

Applications and sample types

Decision-making on when and how to implement shotgun metagenomics and Kepler analysis is determined by the sample type, application, and research objectives requiring comprehensive ecosystem characterization.
Whilst shotgun metagenomics can be performed on any sample type; human, non-human or environmental, the Kepler approach has particular advantages over amplicon-based methods in:

  • Environmental microbiomes requiring comprehensive characterization of diverse viruses, fungi, and protists alongside bacterial communities.
  • Food safety applications demanding simultaneous detection of pathogens, spoilage organisms, and beneficial microbes across multiple kingdoms.
  • Clinical samples requiring accurate pathogen identification with antimicrobial resistance and virulence factor profiling for diagnostic applications.
  • Industrial bioprocessing applications needing contamination detection and process optimization through complete ecosystem monitoring.
  • Novel ecosystem discovery projects targeting unexplored environments where comprehensive profiling reveals previously unknown microbial diversity.
     

Cmbio customers have used long-read amplicon sequencing and analysis to analyze:

Environmental samples (soil, water)

Environmental samples (soil, water, air)

Food and agricultural microbiomes

Food and agricultural samples

Animal and veterinary samples

Animal microbiomes

Low biomass human microbiomes

Low biomass human microbiomes

Industrial  samples

Industrial samples

Clinical  samples

Clinical samples

Marine and freshwater ecosystems

Marine ecosystems

Wastewater and biogas systems

Wastewater treatment systems

Biofilms

Biofilm communities

Fermentation and bioprocessing samples

Fermentation processes

Ready to unlock the microbiome?

 

Complete your microbiome study effortlessly with a single, integrated platform containing a customizable, no-code pipeline, data storage and statistics toolbox.

Need high-quality sequencing services to create your data?

Shotgun metagenomics via ONT, Illumina, and Element chemistry is available at Cmbio in Europe and US lab locations. 

Want to run a few samples and test the pipelines?

 Contact us below and a member of the team will reach out to arrange a call to discuss your project.