Metagenomic classification

Analyzing whole genome sequencing (WGS) and targeted sequencing (e.g., 16S) data on One Codex

Denise Lynch avatar
Written by Denise Lynch
Updated over a week ago


As soon as a sample is uploaded it is automatically compared against the One Codex Database, which identifies all the microbes it contains (bacteria, viruses, fungi, protists, and archaea). One Codex also automatically detects if a sample is 16S, 18S, or ITS sequencing data and will run the curated Targeted Loci Database for these samples.

You can view the results of that analysis by clicking the "View Results" button for any given sample.

Organisms are identified on the basis of distinctive nucleotide sequences that are shared among a group of organisms. The sequences that are unique to a particular grouping (e.g., a genus or species) can be used to identify them.

This page will walk through each panel of the results display, and then fill in more details on the advanced features and underlying bioinformatics.

Example analysis

Below is an example of a metagenomic classification analysis on One Codex:

The organisms we detect are displayed in each of the panels of the analysis page. The top panel shows the species that are detected at High Abundance (>25% of the sample), Medium Abundance (5-25% of the sample), and Low Abundance (0.5-5% of the sample).

The second panel shows the organisms laid out in a "donut chart," with a drop-down menu on the right that allows the user to switch between displaying Abundances and Readcounts (more details on those statistics here).

The third panel (above) displays the results on a taxonomic tree, with a slider that controls the display threshold.

The fourth and final panel shows the results as a data table presenting the amount of the sample that was assigned to each organism. You can filter the table using the search bar at the top-left, and you can sort the table by using the arrows at the top of each column. In addition, you can download the data in this table by using the "Save" button at the top-right. Note: when you filter the table, the downloaded results will also be filtered.

Switching between analyses

Users can switch between results for a single dataset by selecting different analyses in the top-right of the screen.

Every sample that is uploaded to One Codex is automatically analyzed against the One Codex Database, which contains ~115K complete microbial genomes, including bacteria, viruses, fungi, protists, and archaea. A list of the organisms found in these databases can be found here.

For analysis of 16S (and other amplicons) we provide a Targeted Loci Database, which contains a number of commonly used full-length marker genes including 16S, ITS, 18S, etc. You can find more details on this analysis here (Targeted Loci Analysis)

Abundances & readcounts

The results of Metagenomic Classification are presented to the user in terms of readcounts and abundances. The readcounts are presented for every dataset, and are simply the number of reads that contain sequences that are specific to a given group of organisms. In addition, many samples will also display an abundance, which corresponds to the relative number of genome copies present for a given species. Note: Species-level abundances are only presented for samples containing reads that have been randomly sampled from a mixture of genomic DNA (i.e., WGS, not amplicon data such as 16S).

Accessing raw data

If you want to access the raw data listing the taxonomic assignment that was made for each individual sequence (i.e., each read or contig), you can download it by clicking "Read-Level Data" at the top of the page. You can find a description of the format of the resulting gzipped TSV file here.

For additional bioinformatics details, please also see the technical appendices on the One Codex Database and Targeted Loci Database.

Did this answer your question?