One Codex

The One Codex Database (as of January 2025) consists of ~148K complete microbial genomes, including approximately:

42K bacterial species (71K bacterial genomes)

1.3K eukaryote species including fungi (2K eukaryote genomes)

2K archaeal species (2.2K archaeal genomes)

and 1.6K mouse gut Metagenome-Assembled Genomes (MAGs). 

- 25K viral species (72K viral genomes)
- 42K bacterial species (71K bacterial genomes)
- 1.3K eukaryote species including fungi (2K eukaryote genomes)
- 2K archaeal species (2.2K archaeal genomes)
- and 1.6K mouse gut Metagenome-Assembled Genomes (MAGs). 

The human and mouse genomes are included to screen out host reads, and you can find the complete list of organisms in our <a href="https://app.onecodex.com/references" rel="nofollow noopener noreferrer" target="_blank">list of references</a>. The database is assembled from both of public and private sources, with a combination of automated and manual curation steps to remove low quality or mislabeled records. Analysis against the One Codex Database provides:

Highly accurate identification of microbes from genomic sequence data

Precise quantification of microbial abundance using whole-genome shotgun (WGS) sequencing

Community-wide characterization of complex microbial mixtures, including the human microbiome

1. Highly accurate identification of microbes from genomic sequence data
2. Precise quantification of microbial abundance using whole-genome shotgun (WGS) sequencing
3. Community-wide characterization of complex microbial mixtures, including the human microbiome

Analyzing samples against the One Codex Database

Comparing a microbial sample against the One Codex Database consists of three sequential steps:

K-mer based classification. Every individual sequence (NGS read or contig) is compared against the One Codex Database by exact alignment using k-mers where k=31 (see <a href="https://academic.oup.com/bioinformatics/article/29/18/2253/240111/Scalable-metagenomic-taxonomy-classification-using" rel="nofollow noopener noreferrer" target="_blank">Ames et al., 2014</a> and <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46" rel="nofollow noopener noreferrer" target="_blank">Wood et al., 2014</a> for details on k-mer based classification).

Artifact filtering. Based on the relative frequency of unique k-mers in the sample, sequencing artifacts are filtered out of the sample. This filtering should run automatically on most WGS data and does not eliminate low abundance or low confidence hits, only probable sequencing or reference genome artifacts.[1]

Species-level abundance estimation. The relative abundance of each microbial species is estimated based on the depth and coverage of sequencing across every available reference genome.

1. K-mer based classification. Every individual sequence (NGS read or contig) is compared against the One Codex Database by exact alignment using k-mers where k=31 (see <a href="https://academic.oup.com/bioinformatics/article/29/18/2253/240111/Scalable-metagenomic-taxonomy-classification-using" rel="nofollow noopener noreferrer" target="_blank">Ames et al., 2014</a> and <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46" rel="nofollow noopener noreferrer" target="_blank">Wood et al., 2014</a> for details on k-mer based classification).
2. Artifact filtering. Based on the relative frequency of unique k-mers in the sample, sequencing artifacts are filtered out of the sample. This filtering should run automatically on most WGS data and does not eliminate low abundance or low confidence hits, only probable sequencing or reference genome artifacts.[1]
3. Species-level abundance estimation. The relative abundance of each microbial species is estimated based on the depth and coverage of sequencing across every available reference genome.

[1] Note: Users can access results without artifact filtering on the individual analysis pages by clicking "view unfiltered results". These raw results are not recommended and should only be used for diagnostic purposes or for comparison to pure read-level classifiers, e.g., Kraken. Please feel free to contact us if you believe you have a sample where an important taxa is not displayed in the filtered result set.

Increased accuracy with the One Codex Database

The figure below compares the latest version of One Codex against Kraken and MetaPhlAn using an in silico simulated sample from Segata et al. (2012). It shows how the latest version of the platform provides extremely accurate relative abundance estimates, while also substantially limiting the number of false positives when compared to previous k-mer based methods like Kraken.

Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics. 2013; 29(18):2253-60. Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using clade-specific marker genes. Nat Methods. 2012 Jun 10;9(8):811-4. doi: 376 10.1038/nmeth.2066. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46. ​

One Codex Database

ATCC Genome Portal

Status Page

Find answers and get help from Intercom Support and Community Experts

This site employs cookies and other technologies that we and our third party vendors use to monitor and record personal information about you and your interactions with the site (including content viewed, cursor movements, screen recordings, and chat contents) for the purposes described in our Cookie Policy. By continuing to visit our site, you agree to our {websiteTermsLink}, {privacyPolicyLink} and {cookiePolicyLink}.

This site uses cookies and similar technologies ("cookies") as strictly necessary for site operation. We and our partners also would like to set additional cookies to enable site performance analytics, functionality, advertising and social media features. See our {cookiePolicyLink} for details. You can change your cookie preferences in our Cookie Settings.

We use cookies to make our site work and also for analytics and advertising purposes. You can enable or disable optional cookies as desired. See our {cookiePolicyLink} for more details.

Advertising cookies are set by our advertising partners to collect information about your use of the site, our communications, and other online services over time and with different browsers and devices. They use this information to show you ads online that they think will interest you and measure the ads' performance. Social media cookies are set by social media platforms to enable you to share content on those platforms, and are capable of tracking information about your activity across other online services for use as described in their privacy policies.

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

These cookies are necessary for the website to function and cannot be switched off in our systems.

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

You have the right to opt out of the sale of your personal information. See our {cookiePolicyLink} for more details about how we use your data.

Your Privacy Choices

We use cookies to enhance your experience. You can customize your cookie preferences below. See our {cookiePolicyLink} for more details.

Cookie Settings

Link, Press control-option-right-arrow to exit

Empty Help Center

Uh oh. That page doesn’t exist.

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Tickets submitted through the messenger or by a support agent in your conversation will appear here.

One Codex Database

Analyzing samples against the One Codex Database

Increased accuracy with the One Codex Database

References