Long-Read Amplicon Profiling

Introduction to Long-Read Amplicon Sequencing with Cosmos-Hub

The Cosmos-Hub team have collaborated to implement an optimized pipeline based on Emu, a tool created for estimating microbial relative abundance from full-length, microbial amplicon gene sequences.

Emu is an expectation maximization pipeline that uses sophisticated probabilistic algorithms that consider your entire sample context when assigning taxonomic labels. This approach significantly reduces false positives and improves accuracy for closely related species, critical for clinical and research applications where precision matters. It is appreciated among the scientific community for its super performance, as shown in many benchmarks.

Emu’s core strength lies in its native compatibility with long-read sequencing platforms, such as PacBio and Oxford Nanopore Technologies, hence taking into account not only the sequence length but also the error profiles of these sequencers.

The 16S rRNA region is the gold standard for bacterial and archaeal profiling, enabling taxonomic identification and quantification of prokaryotic community structure across diverse environments.

The 18S rRNA region targets microbial eukaryotes—such as protists and fungi—broadly capturing the diversity of non-bacterial components of microbiomes. Its conserved and variable regions allow characterization of taxonomic composition among eukaryotes, and protocols often include blocking primers to reduce host DNA contamination in hosts such as plant microbiomes.

The ITS region is widely used as the primary genetic marker for fungal community profiling and species identification. Its high sequence variability enables high-resolution discrimination of closely related fungal taxa and complements the information provided by 16S/18S amplicons.

The 16S-ITS-23S complete rRNA operon, offers maximal taxonomic resolution for bacteria and archaea, spanning all hypervariable regions and providing robust species- and even subspecies-level classification. This approach improves confidence in taxonomic assignment and allows detection of complex community structures.

Databases and Outputs

The Cmbio team have curated 8 different databases for users to choose from, based on their amplicon of choice, sample type and personal preferences. This gives users the opportunity to leverage multiple databases for microbiomes without specialized reference databases to maximize discovery. The Emu-formatted taxonomic profiling outputs of the pipeline can be exported directly from the platform as well as plugged directly into the Cosmos-Hub Statistics Toolbox

Direct FASTQ Upload

Simply drag and drop your FASTQ’s or use the command line and upload your samples directly to the platform in a matter of minutes

Parameter-Rich Customization

Flexible parameters including read length, quality thresholds, and database selection to tailor analyses. Customize your analysis or use the Cosmos-Hub recommended parameters, as per your comfort level.

Performance Enhancement

Developed in collaboration with our long-read sequencing and analysis center of excellence in Aalborg (formerly DNASense).

Time to Results

Generate results in a matter of hours once the files are successfully uploaded.

Multi-Database Integration

Choose from a number of specialized databases to suit your sample type and study question for versatile and precise analyses.

Statistics Toolbox Integration

Once profiling is complete, leverage your metadata to create groups and generate statistical analyses and interactive visualizations.

Applications and Sample Types

Decision-making on when and how to implement long-read amplicon sequencing and analysis is determined by the sample type, application and kingdom(s) of the organism one is interested in.

Whilst long-read amplicon sequencing can be performed in any sample type; human, non-human or environmental, this approach has some particular advantages over shotgun metagenomics in:

High-host content samples like human tissue or plant microbiomes, for example.
Microbiome discovery in novel sample types and poorly characterized environments where reference genomes may negatively impact shotgun metagenomics like swine fecal samples, for example.
Low biomass samples like skin microbiome, certain foodstuffs or clinical samples for infectious disease diagnostics.
Fungal or archaea dominated-samples.