Assembly, QC and annotation pipeline versions
View the changelogs for the pipelines used to assemble and annotation ATCC genomes
Denise Lynch avatar
Written by Denise Lynch
Updated over a week ago

A number of pipelines were used to generate the assemblies, their QC, and annotations available in the ATCC genome portal. Here, we provide a change log for those pipelines. If you are unsure which pipeline or version was used for your genome of interest, reach out to us through the message box at the bottom-right of your screen.

Assembly Pipelines

Bacteriology

  • Date: April 25, 2019

    • Initial Bacterial hybrid assembly pipeline

    • Runs readsQC to quality trim both Illumina and Oxford Nanopore Technologies (ONT) reads

    • Runs Unicycler (v0.4.4) to assemble genome

  • Date: August 1, 2019

    Changes:

    • Runs fastp to trim Illumina reads

    • Runs filtlong to trim ONT reads

    • Downsamples Illumina reads to 150X genome depth and ONT reads to 60X

    • Updates Unicycler to v0.4.8

  • Date: December 11, 2019

    Changes:

    • Downsamples ONT reads to 30X genome depth

Virology

  • Date: July 15, 2020

    • Runs fastp on Illumina reads

    • Uses SPAdes to assemble

    • minimap2 aligns trimmed reads to assembly

    • Masks low depth (<10X) regions

  • Date: April 29, 2021

    Changes:

    • Trims terminal masked regions from assemblies

    • Adds modification to Unicycler to raise exception if Racon runs out of memory

Mycology

  • Date: July 10, 2020

    • Initial assembly pipeline for hybrid fungal assemblies

    • Runs fastp on Illumina reads

    • Runs Filtlong on ONT reads

    • Runs MaSuRCA with FLYE on filtered read sets

  • Date: March 23, 2021

    Changes:

    • Estimates genome size on Illumina reads

    • Adds downsampling of filtered Illumina reads to 150X depth of estimated genome size

    • Adds downsampling of ONT reads to 30X depth of estimated genome size

Assembly QC pipeline

Bacteriology

  • Date: October 30, 2018

    • Initial bacterial hybrid assembly QC pipeline

    • Uses CheckM to assess assembly quality, completion and contamination

  • Date: June 7, 2019

    Changes:

    • Maps trimmed ONT reads to assembly using minimap2 to calculate ONT depth

    • Maps trimmed Illumina reads to assembly using BWA

    • Adds custom script to calculate other assembly statistics

Virology

  • Date: July 13, 2020

    • Initial virology assembly QC pipeline

    • Aligns contigs to reference database to identify the best reference species

    • Checks if all segments in each reference species are present in assembly, using GenBank segment information

    • Reports alignment quality

  • Date: August 05, 2020

    Changes:

    • Includes sub-species sequences in reference database

  • Date: December 10, 2020

    Changes:

    • Uses alignment results to identify segments, in place of GenBank segment information

  • Date: January 21, 2021

    Changes:

    • Calculates assembly completeness score (assembly length / reference length)

Mycology

  • Date: September 24, 2020

    • Initial mycology assembly QC pipeline

    • Maps raw reads to assembly to calculate depth

    • Runs BUSCO 4.1.2 with BUSCO database Fungi_ODB10 to calculate assembly completeness score

    • Calculates additional assembly statistics

Annotation Pipelines

Bacteriology

  • Date: September 7, 2018

    • Initial bacterial annotation pipeline

    • Uses prokka with genus-specific BLAST database

  • Date: September 18, 2019

    Changes:

    • Does not use genus-specific BLAST database

Virology (Variant calling)

  • Date: October 18, 2020

    • blastn aligns reference genome to assembly to identify matching segments

    • Uses MAFFT to align matching reference and assembly segments

    • Custom script examines MAAFT alignment results for variant types

  • Date: November 24, 2020

    Changes:

    • Improves method for identifying variant types

Mycology

  • Date: October 09, 2020

    • Initial annotation pipeline for fungal assemblies

    • Runs BUSCO 4.1.2 with BUSCO database Fungi_ODB10 for annotations of Universal Single-Copy Orthologs

Did this answer your question?