In collaboration with the team at Twist Bioscience, we've developed rapid workflows on the One Codex platform designed specifically to analyze data from the Twist Respiratory Virus Research Panel and the Twist Comprehensive Viral Research Panel. Both of these workflows are included with the purchase of the respective product from Twist Bioscience, and are designed to help you take your raw sequence data and generate a detailed yet easy-to-interpret report.
Both workflows share a similar methodology, which a couple of small differences that are outlined below. Principally, these analyses use sequence alignment against a set of curated viral reference genomes to determine the estimated depth and breadth coverage and estimated identity of the sequence reads to the viral genomes detected in the sample.
Twist Respiratory Virus Research Panel
The analysis for the Twist Respiratory Virus Research Panel uses a set of 29 viral reference genomes curated by the team at Twist Bioscience. A complete list of viruses included in the panel is available here. The One Codex platform automatically aligns your sequence data against these viral reference genomes, before summarizing the results into a convenient PDF report.
Twist Comprehensive Viral Research Panel
When you upload data from the Twist Comprehensive Viral Research Panel, we first analyze it against our One Codex Database. This is a large, curated reference database consisting of over 115k genomes, including over 48k whole viral reference genomes. The One Codex Database uses a k-mer based classification method, which lets us rapidly and accurately identify any of the more than 3,153 viruses in the Panel. You can read more about our One Codex Database analysis here.
Once we've identified which viruses are present in your sample, we gather the reference genomes for viruses which are present at more than 0.1% relative abundance. From there, we perform a secondary sequence alignment step in order to provide more detailed coverage, depth and identity metrics for each of the viruses present in the sample.
Both workflows use
minimap2 as the sequence alignment tool. We run
minimap2 with default alignment settings, using the built-in preset for aligning short input reads. Since there is significant homology between many of the viruses in the panel, we allow for multi-mapping by retaining all secondary alignments of equal quality to the primary alignment.
After the sequence alignment is complete, we calculate mean sequencing depth across the entire reference ("average depth"), fraction of the reference covered by at least one read ("coverage"), and cumulative sequence identity ("identity").
The final PDF report identifies a given virus as being either "Present" or "Indeterminate" according to the following thresholds:
Present: In order to be considered "Present", we must observe at least 20% of the reference genome covered, with an average depth of 10x across the entire genome.
Indeterminate: If a given virus falls short of being considered "Present", it is still considered "Indeterminate" as long as we observe at least 5% of the genome.
If a virus does not pass the "Indeterminate" threshold, it is considered not detected and is excluded from the report.