4 min read

EMU Microbiome Analysis: Platform Overview

EMU Microbiome Analysis: Platform Overview

EMU is a computational pipeline built for high-accuracy species-level profiling in microbial communities using full-length 16S rRNA gene reads. The platform is uniquely optimized for long-read sequencing technologies such as Oxford Nanopore and PacBio (read more about our approach in the Microbiome Profiling section).

By implementing an expectation-maximization algorithm, EMU excels at correcting sequencing error and refining taxonomic classification with every iteration, supporting researchers who require accurate, full-length 16S analyses and species-level abundance estimates. 

 

What Is EMU and What Problem Does It Solve?

 

EMU demonstrates improved species-level profiling accuracy and reduced false positives in comparison to alternative methods. (Curry et al., 2022, Nat Methods).

EMU pipeline directly addresses the challenge of species-level resolution in amplicon-based microbiome analysis. Unlike short-read amplicon assays that restrict taxonomic assignments mostly to the genus level, EMU characterizes diversity across closely related microbes by analyzing the complete ~1,550bp 16S rRNA gene (compared to ~250bp fragment captured with short-read 16S). This additional coverage leads to a more detailed overview of the 16S genes present in the sample, providing species-level taxonomic abundance outputs for any sample type, particularly low-biomass and high-host content (i.e. tissue). .

Researchers can also interpret taxonomic  abundance with improved clarity, thanks to EMU’s sophisticated error model and minimap2-based alignments providing accurate taxonomy lineages, even from error-rich Oxford Nanopore Technologies reads.

Learn more about species-level profiling capabilities and datasets in the Content Library.

 

Figure 1. Pictorial representation of Emu algorithm

 

Why EMU stands out: Species-Level Profiling for Full-Length 16S rRNA Reads

 

  • Delivers species-level precision, even for genomically similar microbes
  • Integrates sophisticated error correction for challenging long-read data
  • Supports research across previously inaccessible microbial environments

By enabling detailed profiling, EMU allows researchers to detect species that remain invisible to other platforms, enhancing species detection in clinical studies, ecological monitoring, and food safety research.

 

Core Features and Analytical Workflow

 

EMU delivers iterative, error-aware taxonomic profiling. Core features of the EMU analytical workflow include:

  • Accepts input files in FASTA/FASTQ (.fa/.fq) format, supporting both single- and paired-end sequences from Nanopore, PacBio, and Illumina devices.
  • Align reads against curated reference data such as Greengenes, SILVA, HOMD, MIDAS,  and more, 
  • Delivers .tsv files containing EMU output relative abundances across multiple taxonomic ranks, including species-level percentages, family-level summaries, and taxonomy collapse options. Note that estimated read counts are likelihood-based probabilities rather than absolute counts. 
  • Unmapped reads or reads assigned to species below detection thresholds are separately reported to maintain transparency and interpretability for downstream analysis.
  • Filter ultra-low-abundance species, focusing on robust results for characterizing diversity.
  • Allows parallel taxonomic classifications, databases with direct taxonomy, and secondary alignments to improve classification accuracy.

For advanced benchmarking and throughput, the Cosmos-Hub platform offers a number of methodologies via the Statistics Toolbox.

 

EMU Pipeline Accuracy, Benchmarking & Performance

 

EMU outperforms alternative tools in species-level resolution and error minimization.

 

Supported Databases & Customization Options

 

EMU supports extensive curatedreference databases.

  • SILVA: Comprehensive, manually curated database ideal for environmental samples
  • GTDB: Genome Taxonomy Database with standardized prokaryotic taxonomy
  • Greengenes2: Updated classification system with improved phylogenetic framework  
  • HOMDB: Specialized for oral microbiome studies
  • MiDAS: Optimized for activated sludge and wastewater treatment systems
  • CHAMP: Human microbiome-focused database
  • UNITE: Fungal ITS sequences for multi-kingdom studies
  • rrnDB: Ribosomal RNA gene database with copy number information
  • Emu Default: Curated combination of rrnDB v5.6 and NCBI 16S RefSeq

Adaptable database selection ensures EMU remains relevant as microbial taxonomy evolves, supporting innovation and routine application alike.

Figure 2. Performance on simulated ONT reads via EMU Pipelines

 

Platform Access,, System Requirements & Usability

 

EMU can be easily run with customized parameters in the Cosmos-Hub web-based GUI, compared to traditional command-line options. Additionally, we have optimized EMU’s performance along with integrated multiple databases, enabling researchers to quickly profile their data and create comparative analyses. 

The only requirements for running EMU via the Cosmos-Hub are high-quality FASTQ files produced by a long-read amplicon workflow (16S, ITS, 18S) and a web-browser. No other computational requirements!

 

Limitations & Cautions

 

By default, EMU restricts ultra-low-abundance taxa (species below 1–10 reads), enabling a focus on robust, interpretable results. Users interested in examining only a portion of their sample or needing to detect every possible community member can change abundance output thresholds or supplement with alternative data processing approaches. 

Key challenges to plan ahead of include:

  • Filters out species below ~1–10 reads, typically limiting ultra-rare detection in highly diverse samples
  • Users needing every community member—including those barely present—should adjust EMU thresholds or supplement with alternative bioinformatics approaches.

Nonetheless, EMU offers the most effective model for robust, reproducible species-level microbiome studies in mainstream applications. 

 

Utilize no-code EMU Pipelines via Cosmos-Hub

 

Setting up EMU locally can be a technical headache. Researchers must install Conda, configure Python ≥3.6, manage databases from various sources, and optimize minimap2 to handle large datasets without memory issues. 

Multi-threading, batch sizes, and output structures all require careful tuning for high-throughput analysis, and integration with R, Snakemake, or cloud workflows adds another layer of complexity. Even after setup, interpreting results demands attention to abundance thresholds and handling ultra-low-abundance taxa, which can be particularly tricky in diverse microbial communities.

Cosmos-Hub eliminates these barriers by delivering powerful, no-code access to EMU pipelines. With Cosmos-Hub:

  • Researchers can run full-length 16S species-level analyses instantly, without installing or configuring ANY software.
  • Resource and memory management is handled automatically, so large datasets and multi-threaded analyses are seamless.
  • Ultra-low-abundance taxa and output thresholds are pre-configured for robust, reproducible results, reducing the risk of misinterpretation.
  • Cosmos-Hub integrates EMU outputs into a streamlined, user-friendly interface, making downstream analysis, visualization, and reporting simple—even for users without programming experience.

Whether you’re a lab scientist, bioinformatician, or industry professional, Cosmos-Hub removes the technical friction so you can focus on what matters most: interpreting microbiome data and generating actionable insights.

Book a Demo or Contact Us for tailored enterprise plans

 

 


EMU Default Database FAQs

 

How is EMU different from other 16S analysis tools?

 

EMU specializes in full-length, species-level profiling and incorporates iterative error correction, reducing false positives compared to genus-level or short-read platforms.

 

What kinds of microbiome environments is EMU validated for?

 

EMU is validated on gut, vaginal, marine, soil, synthetic, environmental, and clinical samples, demonstrating versatility and reliability across research contexts.

 

Is the EM algorithm universal or unique to EMU?

 

EMU’s EM algorithm is specifically configured for full-length 16S rRNA profiling, offering a tailored implementation not commonly found in other platforms. Cmbio has also added additional optimizations to ensure seamless performance, which is discussed further in our documentation (https://docs.cosmosid.com/docs/emu).

Kepler Microbiome Analysis: Tools & Applications

Kepler Microbiome Analysis: Tools & Applications

The Kepler metagenomic profiler is a host-agnostic pipeline focused on taxonomic and functional profiling. Powered by Cosmos-Hub, Kepler is...

Read More
Bioinformatics Pipelines in Microbiome Analysis: A Comprehensive Guide

Bioinformatics Pipelines in Microbiome Analysis: A Comprehensive Guide

A bioinformatics pipeline is the central engine driving microbiome analysis, turning raw sequencing data into interpretable results. Every aspect of...

Read More
DADA2 Software: Microbiome Data Processing

DADA2 Software: Microbiome Data Processing

DADA2 is an open-source R package for accurate sample inference on amplicon sequencing data, outputting fewer spurious sequences while generating...

Read More