A: One Codex classifies nucleotide reads by checking for exact k-mer matches against our database of bacterial, viral, and fungal genomes. By default, we analyze all samples against our expanded One Codex Database. You may also run an analysis against the National Center for Biotechnology Information's (NCBI) Reference Sequence Database (RefSeq). Further details on the exact organisms included in both databases can be found on the references page.
Our platform then matches all overlapping k-mers in a given read to the most specific organism and taxonomic level possible. Because not all k-mers are unique to an individual species or strain, each k-mer is classified to the lowest common ancestor (LCA) within a taxonomic/phylogenetic tree. Finally, we aggregate individual k-mer matches across a given read and assign the most specific, consistent taxonomic ID to the read (the highest weighted root-to-leaf path of k-mer matches across the taxonomic tree). More details are available on our documentation site.
Please note that input reads or sequences must be at least 32 base pairs in length.