Sequencing and Analysis Methods

Below you will find some proposed text that you may wish to use when publishing your sequence data generated by One Codex:

Microbiome Analysis. One Codex Reformat Kits containing barcoded sample collection tubes were provided by One Codex, Inc. Fecal samples were placed in individual tubes containing DNA stabilization buffer to ensure reproducibility, stability, and traceability, and shipped for DNA extraction, library preparation, and sequencing by One Codex, Inc. DNA extraction was optimized and fully automated using a robust process for reproducible extraction of inhibitor-free, high molecular weight genomic DNA that captures the true microbial diversity of stool samples. After DNA extraction and quality control (QC), genomic DNA was converted into sequencing libraries using a method optimized for minimal bias. Unique dual indexed (UDI) adapters were used to ensure that reads and/or organisms are not mis-assigned. After QC, the libraries were sequenced using the shotgun sequencing method (a depth of 2 million 2x150 bp read pairs), which enables species and strain level taxonomic resolution. Sequencing data were uploaded automatically onto One Codex [1] analysis software and analyzed against the One Codex database consisting of more than 148K whole microbial reference genomes. The classification results were filtered through several statistical post-processing steps designed to eliminate false positive results caused by contamination or sequencing artifacts.

Here are some more details for your reference on how we process and analyze samples:

Reference microbial genomes:

The One Codex Database consists of ~148K complete microbial genomes, including 71K distinct bacterial genomes, 72K viral genomes, and thousands of archaeal and eukaryotic genomes. Human and mouse genomes are included to screen out host reads, and you can find the complete list of organisms in our list of references. The database is assembled from both of public and private sources, with a combination of automated and manual curation steps to remove low quality or mislabeled records.

DNA extraction, library prep, and sequencing:

DNA extraction is performed using the Qiagen DNeasy 96 PowerSoil Pro QIAcube HT extraction kit and protocol. Library preparation is performed using the KAPA HyperPlus library preparation protocol. Sequencing is performed using the Illumina NextSeq 2000 instrument and protocol. Raw data (in the form of FASTQ files) is analyzed using the One Codex Database as above.

Taxonomic Classification

Comparing a microbial sample against the One Codex database consists of three sequential steps:

K-mer based classification. Every individual sequence (NGS read or contig) is compared against the One Codex database by exact alignment using k-mers where k=31 (see Ames et al., 2014 [2] and Wood et al., 2014 [3] for details on k-mer based classification).
Artifact filtering. Based on the relative frequency of unique k-mers in the sample, sequencing artifacts are filtered out of the sample. This filtering should run automatically on most WGS data and does not eliminate low abundance or low confidence hits, only probable sequencing or reference genome artifacts.
Species-level abundance estimation. The relative abundance of each microbial species is estimated based on the depth and coverage of sequencing across every available reference genome.

References

[1] Minot SS, Krumm N, Greenfield NB. One Codex: A Sensitive and Accurate Data Platform for Genomic Microbial Identification. BioRxiv. 2015; doi: 10.1101/027607

[2] Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics. 2013; 29(18):2253-60. doi: 10.1093/bioinformatics/btt389
[3] Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.