Bioinformatics Workflow

Every dataset is aligned against the set of organisms in the input mixtures, as well as the complete set of references in the One Codex Database (for WGS samples) or Targeted Loci Database (for 16S samples). The abundance of each detected organism is used to calculate the scores (ranging from 0-100%) for True Positives, Relative Abundance, and False Positives.

Bioinformatics Details

Read Alignment: Alignment of reads against the set of organisms in the input mixtures is executed using BWA (v0.7.15). Paired-end reads are analyzed with the -p flag, and Samtools (v1.4) is used to process the resulting SAM file (samtools fixmate).

Reference Databases: More details can be found here for the One Codex Database (used to analyze WGS files) and the Targeted Loci Database (used to analyze 16S files).

Scoring Details

  • True Positives: The True Positive Score is the percentage of organisms present in the control. Organisms are marked as "Present" if they are detected within two logs of the true abundance. Also presents a "proportion of true positives" figure, which is the number of reads mapping to an individual genome over the number of reads mapping to any true positive organism and "proportion of reads", which proportion of total reads in the sample mapping to a genome adjusted for genome size.
  • Relative Abundance: The Relative Abundance Score is the Pearson correlation coefficient between the known input organism abundances and the detected abundances (based on genome-size adjusted read counts).
  • False Positives: The False Positive Score is 100% less 10 percentage points for each "High" abundance false positive, 5 points for each "Moderate" one, and 1 point for each "Low" one. "Trace" false positives do not count against the score, and the minimum possible score is 0%.

