Genome Annotation

Understand our approach to Genome Annotation for the ATCC Genome Portal

Austin Davis-Richardson avatar
Written by Austin Davis-Richardson
Updated over a week ago

Bacterial Genome Annotation

There are currently several approaches for bacterial genome annotations [1, 2, 3]. As such, we make our finalized genome assembly FASTA files available for download from our genome portal and encourage our customers to conduct their own custom annotations of the ATCC reference-grade genomes if they so choose. However, we also recognize the need for a rapidly accessible annotation in a common format for those looking to perform immediate data analysis at the gene level. To address these needs, we provide a default genome annotation for ATCC reference-grade genomes with prokka [2]. Briefly, prokka relies on a number of tools to annotate CDS, rRNA, tRNA, signal leader peptides, and non-coding RNA. For CDSs, prokka leverages the UniProt [4], RefSeq [5], Pfam [6], and TIGRFAM [7] databases to assign protein identity. On the genome portal, all annotated CDSs include their EC number and UniProt ID as reported by prokka.

Currently we are transitioning to the use of the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) [3] for bacterial annotation in the Oatmeal pipeline. PGAP combines ab initio gene prediction algorithms with homology-based methods. PGAP leverages the Protein Family Models collection for structural and functional annotation. This collection is composed of Hidden Markov Model (HMM), Blast (BlastRules), and Conserved Domain Database-based architectures (CDDs) to assign names, gene symbols, publications, and EC number to the proteins that meet criteria for protein family inclusions.
​   

Mycology Genome Annotation

During completeness calculations for mycology genomes, BUSCO [8] generates annotations of universal single-copy orthologs, which we make available in the genome portal. BUSCO uses Augustus (trained on BUSCO databases), tBLASTn, and HMMER3 to automatically predict and annotate single-copy coding regions of mycological genomes according to their closest relatives on fungi-specific databases.

Viral Genome Annotation

Viral annotations are currently not included on the ATCC Genome Portal. We are working to enable the inclusion of viral annotation for the Oatmeal pipeline by the approach of using VIGA program [9] on the finalized viral assembly. While we encourage customers to conduct their own annotations by downloading the reference-grade genome fasta assembly to ensure complete control, we also will provide these VIGA generated annotation files available through download for ease-of-use and for immediate data analysis.

References

  1. Overbeek R, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Research, 42(D1): D206–D214, 2014. PubMed: 24293654

  2. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14): 2068–2069, 2014. PubMed: 24642063

  3. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016 Aug 19;44(14):6614-24. doi: 10.1093/nar/gkw569. Epub 2016 Jun 24. PMID: 27342282; PMCID: PMC5001611.

  4. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Research, 43(D1): D204–D212, 2015. PubMed: 25348405

  5. Tatusova T, et al. RefSeq microbial genomes database: New representation and annotation strategy. Nucleic Acids Research, 42(D1): 3872, 2014. PubMed: 25824943

  6. Finn RD, et al. Pfam: the protein families database. Nucleic Acids Research, 42(D1): D222–D230, 2014. PubMed: 24288371

  7. Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic Acids Research, 31(1), 371–373, 2003. PubMed: 12520025

  8. Seppey, M., M. Manni, and E.M. Zdobnov. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol Biol, 2019. 1962: p. 227-245.

  9. González-Tortuero, E., Sean Sutton, T.D., Velayudhan, V., Shkoporov, A.N., Draper, L.A., Stockdale, S.R., Ross, R.P. and Hill, C., 2018. VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator. BioRxiv, p.277509.

   

Did this answer your question?