A series of pipelines were used to generate the assemblies, their QC, and annotations available on the ATCC Genome Portal. Information about each pipeline version can be found in the “Quality Control” tab for a genome on the ATCC Genome Portal in the “Notes” field. Currently, we are in the process of unifying the pipeline versions displayed in the “notes” field. Where the absence of a “notes” field denotes a One Codex derived assembly, an Oatmeal version captured in the notes describes the pipeline version used in curation of the genome. The presence of a short “manual assembly” disclaimer denotes an assembly with an earlier pre-production version of Oatmeal pipeline. Here, we provide a change log for all pipelines. If you are unsure which pipeline or version was used for your genome of interest, please contact us through the message box at the bottom-right of your screen.
Assembly and Annotation Pipelines
“Oatmeal” is a benchmarked microbial assembly pipeline built and maintained by the ATCC Bioinformatics team. The pipeline is designed to produce reliable and authentic genome assemblies across each generalized microbial kingdom. Effective from the deployment dates listed below, each new genome on the ATCC Genome Portal has been assembled and curated through the Oatmeal pipeline. As the pipeline evolves over time, the change log will reflect updates to the software and methodology.
Oatmeal v1.0
July 01, 2023 - Current
Fastp (v0.23.2) read trimming and filtering
Nanofilt (v2.8.0) filtering of long reads – Phred Q10 and >1000nt
Kraken (v2.1.2) short read classification and read binning
Seqkit (v2.1.0) Deduplication and downsampling to 100X
bbnorm (v38.62) - Illumina
Filtlong (v0.2.1) - ONT
Unicycler (v0.4.8) hybrid assembly
Polishing of the assembly with Polypolish (v0.5.0)
Quality Control
CheckM (v1.1.3) completeness ≥ 95%
CheckM contamination ≤ 5%
Number of contigs ≤ 30
Illumina and ONT coverage ≥ 100X
Must pass graph QC (≤ 15 connecting contigs)
Assembly status
GOLD – Passes all QC criteria list above and all contigs circularized
BETA – Passes all QC criteria listed above but not all contigs circularized
PGAP (2022-12-13.build6494) bacterial annotator
June 01, 2023 - Current
Fastp (v0.23.2) read trimming and filtering
BWA (v0.7.17) Illumina mapping and removal of Eukaryotic host reads
(DNA Viruses) - Minimap2 (2.23-r1111) ONT mapping and removal of host reads
Spades (v3.14.1) assembly of non-host reads
(DNA Viruses) – Hybrid assembly of non-host reads
Retention of desired contigs over 500bp by taxID-specific Blast (v2.13)
Quality Control
CheckV (v1.0.1) completeness ≥ 80%
Assembly length within 10% of expected length of reference genome
All viral segments must be present but no more than 9 contigs larger than expected contig count
VIGA (v0.11.2) viral annotator
Rollout date anticipated 01 August 2023
One Codex Pipelines
Many of the assemblies on the ATCC Genome Portal have been assembled and curated through the One Codex assembly pipelines prior to deployment of the Oatmeal pipeline (documented below).
Assembly Pipelines
Date: April 25, 2019
Initial Bacterial hybrid assembly pipeline
Runs readsQC to quality trim both Illumina and Oxford Nanopore Technologies (ONT) reads
Runs Unicycler (v0.4.4) to assemble genome
Date: August 1, 2019
Runs fastp to trim Illumina reads
Runs filtlong to trim ONT reads
Downsamples Illumina reads to 150X genome depth and ONT reads to 60X
Updates Unicycler to v0.4.8
Date: December 11, 2019
Downsamples ONT reads to 30X genome depth
Date: July 15, 2020
Runs fastp on Illumina reads
Uses SPAdes to assemble
minimap2 aligns trimmed reads to assembly
Masks low depth (<10X) regions
Date: April 29, 2021
Trims terminal masked regions from assemblies
Adds modification to Unicycler to raise exception if Racon runs out of memory
Date: July 10, 2020
Initial assembly pipeline for hybrid fungal assemblies
Runs fastp on Illumina reads
Runs Filtlong on ONT reads
Runs MaSuRCA with FLYE on filtered read sets
Date: March 23, 2021
Estimates genome size on Illumina reads
Adds downsampling of filtered Illumina reads to 150X depth of estimated genome size
Adds downsampling of ONT reads to 30X depth of estimated genome size
Assembly QC pipeline
Date: October 30, 2018
Initial bacterial hybrid assembly QC pipeline
Uses CheckM to assess assembly quality, completion and contamination
Date: June 7, 2019
Maps trimmed ONT reads to assembly using minimap2 to calculate ONT depth
Maps trimmed Illumina reads to assembly using BWA
Adds custom script to calculate other assembly statistics
Date: July 13, 2020
Initial virology assembly QC pipeline
Aligns contigs to reference database to identify the best reference species
Checks if all segments in each reference species are present in assembly, using GenBank segment information
Reports alignment quality
Date: August 05, 2020
Includes sub-species sequences in reference database
Date: December 10, 2020
Uses alignment results to identify segments, in place of GenBank segment information
Date: January 21, 2021
Calculates assembly completeness score (assembly length / reference length)
Date: September 24, 2020
Initial mycology assembly QC pipeline
Maps raw reads to assembly to calculate depth
Runs BUSCO 4.1.2 with BUSCO database Fungi_ODB10 to calculate assembly completeness score
Calculates additional assembly statistics
Annotation Pipelines
Date: September 7, 2018
Initial bacterial annotation pipeline
Uses prokka with genus-specific BLAST database
Date: September 18, 2019
Does not use genus-specific BLAST database
Virology (Variant calling)
Date: October 18, 2020
blastn aligns reference genome to assembly to identify matching segments
Uses MAFFT to align matching reference and assembly segments
Custom script examines MAAFT alignment results for variant types
Date: November 24, 2020
Improves method for identifying variant types
Date: October 09, 2020
Initial annotation pipeline for fungal assemblies
Runs BUSCO 4.1.2 with BUSCO database Fungi_ODB10 for annotations of Universal Single-Copy Orthologs