Bioinformatics Workflow

The AmpliSeq™ Antimicrobial Resistance (AMR) report on One Codex consists of a gene panel accessible on the One Codex platform and through our API, and a PDF report that summarizes the panel results. When you upload a FASTQ file generated from an AmpliSeq™ AMR Community Panel, One Codex aligns the sequences against the reference marker sequences for all 814 amplicons in the panel design. These marker sequences cover 478 antimicrobial resistance genes across 28 antibiotic classes.

We analyze these alignments to determine the estimated identity, coverage, and depth of each marker in your sequence data. The results are presented in two different formats, as a Panel and as a Report. The differences between these are outlined below.

AmpliSeq™ AMR Panel: The Panel analysis provides marker-by-marker results of the analysis. Each marker in the One Codex panel has a corresponding name to the same marker in the original panel design. The markers in the Panel analysis are grouped by the type of antimicrobial resistance of which they are predictive. The full results of the panel are also available via our API.

AmpliSeq™ AMR Report: The Report analysis summarizes the Panel results and renders them as an easy-to-interpret PDF report. In many cases, a gene is covered by multiple overlapping amplicons. We average the statistics for each tiled marker weighted by marker length to estimate the mean coverage, depth, and identity of the whole gene or gene region. Each gene is then grouped by the type of antimicrobial resistance of which it is predictive.

Bioinformatics Details

Read Alignment: Alignment of reads against the reference markers is performed with BWA (v0.7.12). One Codex the parses the aligned BAM files and calculates summary statistics for each marker.

Panel Analysis: For each marker, we use the identity and coverage scores to make a prediction of whether the marker sequence is present (called "Present" in the analyses), likely present ("Probable"), or absent ("Absent") in the sample.

Report Analysis: Overlapping markers are grouped into a single gene or gene region. Coverage, depth, and identity estimates are calculating as the average of the statistics for the overlapping markers with the highest detection status, weighted by marker length. Detection status is ranked (from highest to lowest priority) as Present, then Probable, then Absent.

Scoring Thresholds

These thresholds determine the detection status called for each marker. They are also described below the results table in every AmpliSeq™ AMR Report.

Present: A marker must have coverage ≥ 85% and identity ≥ 95% to be called as "Present"
Probable: A marker must have coverage ≥ 80% and identity ≥ 90% to be called "Probable"
Coverage < 80% or identity < 90% means a marker will be considered "Absent"

Example Datasets

We've generated some example datasets to demonstrate what how the panel and report looks on the One Codex platform. You can check out these examples below:

EcloaAMR2.fastq.gz
BAA-2468™ Enterobacter cloacae

EcoliAMR2.fastq.gz
BAA-2340™ Escherichia coli

KPAMR2.fastq.gz
BAA-1705™ Klebsiella pneumoniae

KOAMR2.fastq.gz
51983™ Klebsiella oxytoca

Bioinformatics Details of the AmpliSeq™ Report

Bioinformatics Workflow

Bioinformatics Details

Scoring Thresholds

Example Datasets