A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees
- PMID: 34469548
- PMCID: PMC8662617
- DOI: 10.1093/molbev/msab264
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees
Abstract
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.
Keywords: COVID-19; SARS-CoV-2 phylogenetics; genomic surveillance.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Figures
Update of
-
A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees.bioRxiv [Preprint]. 2021 Jul 13:2021.04.03.438321. doi: 10.1101/2021.04.03.438321. bioRxiv. 2021. Update in: Mol Biol Evol. 2021 Dec 9;38(12):5819-5824. doi: 10.1093/molbev/msab264 PMID: 33821270 Free PMC article. Updated. Preprint.
Similar articles
-
A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees.bioRxiv [Preprint]. 2021 Jul 13:2021.04.03.438321. doi: 10.1101/2021.04.03.438321. bioRxiv. 2021. Update in: Mol Biol Evol. 2021 Dec 9;38(12):5819-5824. doi: 10.1093/molbev/msab264 PMID: 33821270 Free PMC article. Updated. Preprint.
-
Tracking and curating putative SARS-CoV-2 recombinants with RIVET.Bioinformatics. 2023 Sep 2;39(9):btad538. doi: 10.1093/bioinformatics/btad538. Bioinformatics. 2023. PMID: 37651464 Free PMC article.
-
Taxonium, a web-based tool for exploring large phylogenetic trees.Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392. Elife. 2022. PMID: 36377483 Free PMC article.
-
matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.Bioinformatics. 2022 Aug 2;38(15):3734-3740. doi: 10.1093/bioinformatics/btac401. Bioinformatics. 2022. PMID: 35731204 Free PMC article.
-
Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic.bioRxiv [Preprint]. 2020 Sep 28:2020.09.26.314971. doi: 10.1101/2020.09.26.314971. bioRxiv. 2020. Update in: Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7 PMID: 33024970 Free PMC article. Updated. Preprint.
Cited by
-
phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr. PLoS Comput Biol. 2022. PMID: 35486906 Free PMC article.
-
Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission.Nature. 2022 Sep;609(7925):101-108. doi: 10.1038/s41586-022-05049-6. Epub 2022 Jul 7. Nature. 2022. PMID: 35798029 Free PMC article.
-
Positive selection underlies repeated knockout of ORF8 in SARS-CoV-2 evolution.Nat Commun. 2024 Apr 13;15(1):3207. doi: 10.1038/s41467-024-47599-5. Nat Commun. 2024. PMID: 38615031 Free PMC article.
-
Maximum likelihood pandemic-scale phylogenetics.Nat Genet. 2023 May;55(5):746-752. doi: 10.1038/s41588-023-01368-0. Epub 2023 Apr 10. Nat Genet. 2023. PMID: 37038003 Free PMC article.
-
The ongoing evolution of UShER during the SARS-CoV-2 pandemic.Nat Genet. 2024 Jan;56(1):4-7. doi: 10.1038/s41588-023-01622-5. Nat Genet. 2024. PMID: 38155331 No abstract available.
References
-
- Ané C, Sanderson MJ.. 2005. Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. Syst Biol. 54(1):146–157. - PubMed
-
- Chaillon A, Smith DM.. 2021. Phylogenetic analyses of SARS-CoV-2 B.1.1.7 lineage suggest a single origin followed by multiple exportation events versus convergent evolution. Clin Infect Dis. Advance Access published March 26, 2021, doi:10.1093/cid/ciab265 - DOI - PMC - PubMed
-
- Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, et al.; Drosophila 12 Genomes Consortium. 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450(7167):203–218. - PubMed
-
- Cyranoski D. 2021. Alarming COVID variants show vital role of genomic surveillance. Nature 589(7842):337–338. - PubMed
-
- da Silva Filipe A, Shepherd JG, Williams T, Hughes J, Aranday-Cortes E, Asamaphan P, Ashraf S, Balcazar C, Brunker K, Campbell A, et al.; COVID-19 Genomics UK (COG-UK) Consortium. 2021. Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland. Nat Microbiol. 6(1):112–122. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
