Skip to main content

Q: There are some tags on my sample - what are they?

Denise Lynch avatar
Written by Denise Lynch
Updated yesterday

When we classify a sample, there are a few additional checks that we make. Some are checks on the high-level details of a sample, others are checks on specific organisms within a sample. We apply some automated tags to help you more easily identify samples with specific features. Note that these tags do not impact the classification of the samples - they merely serve as a quick way to identify samples of similar features. You'll find details of the various tags below.

  • high host: A sample is considered "high host" if

    • >=20% of the reads are mapped to the database, and

    • homo sapiens is the highest species identified, and

    • homo sapiens accounts for >=50% of the mapped reads.

  • complex: A sample is tagged as "complex" if

    • >=20% of the reads have been classified, and

    • the sample is not dominated by one particular species; i.e. the top species accounts for <=25% of the species-level reads, and

    • the sample is also not dominated by one particular order; i.e. the top order accounts for <=50% of the order-level classifications.

  • isolate: A sample is estimated to be an isolate if

    • >=20% of the reads have been classified, and

    • the species with the most reads accounts for >75% of the species-level classified reads.

    • Alternatively, we consider a sample an isolate assembly if

      • the total number of reads is <1000, and

      • >=75% of the reads are mapped to the database, and

      • the species with the most reads accounts for >75% of the species-level classified reads.

  • unclassified: If fewer than 10% of the reads are classified with the One Codex Database, we'll tag a sample as "unclassified", to easily distinguish any unusual samples.

  • species-specific tags: From the classification results, if we have identified that the sample appears to be an isolate, we will check if the isolate is in a list of common pathogens. If we find that it is one of those common pathogens, we will apply a tag, such as "E. coli", to help you easily identify those pathogen isolate samples.

  • ST-##: Within the classification results, we look for certain taxa. These are taxa that are commonly of clinical or environmental interest. If we find these taxa, we will launch our MLST (Multi-Locus Sequence Typing) workflow. This workflow looks for certain alleles of specific markers within your sample. If we identify the various alleles associated with a particular Sequence Type, we apply the appropriate ST tag, to help you more easily identify samples that may contain organisms of interest. You'll find more details on our MLST workflow here.

Did this answer your question?