Petabase-scale sequence alignment catalyses viral discovery
- PMID: 35082445
- DOI: 10.1038/s41586-021-04332-2
Petabase-scale sequence alignment catalyses viral discovery
Abstract
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
© 2022. The Author(s), under exclusive licence to Springer Nature Limited.
Similar articles
-
A structural and primary sequence comparison of the viral RNA-dependent RNA polymerases.Nucleic Acids Res. 2003 Apr 1;31(7):1821-9. doi: 10.1093/nar/gkg277. Nucleic Acids Res. 2003. PMID: 12654997 Free PMC article.
-
Unmapped RNA Virus Diversity in Termites and their Symbionts.Viruses. 2020 Oct 9;12(10):1145. doi: 10.3390/v12101145. Viruses. 2020. PMID: 33050289 Free PMC article.
-
Expansion of the global RNA virome reveals diverse clades of bacteriophages.Cell. 2022 Oct 13;185(21):4023-4037.e18. doi: 10.1016/j.cell.2022.08.023. Epub 2022 Sep 28. Cell. 2022. PMID: 36174579
-
Metagenomics reshapes the concepts of RNA virus evolution by revealing extensive horizontal virus transfer.Virus Res. 2018 Jan 15;244:36-52. doi: 10.1016/j.virusres.2017.10.020. Epub 2017 Nov 8. Virus Res. 2018. PMID: 29103997 Free PMC article. Review.
-
Evolution and taxonomy of positive-strand RNA viruses: implications of comparative analysis of amino acid sequences.Crit Rev Biochem Mol Biol. 1993;28(5):375-430. doi: 10.3109/10409239309078440. Crit Rev Biochem Mol Biol. 1993. PMID: 8269709 Review.
Cited by
-
Database resources of the National Center for Biotechnology Information.Nucleic Acids Res. 2024 Jan 5;52(D1):D33-D43. doi: 10.1093/nar/gkad1044. Nucleic Acids Res. 2024. PMID: 37994677 Free PMC article.
-
First Evidence of Past and Present Interactions between Viruses and the Black Soldier Fly, Hermetia illucens.Viruses. 2022 Jun 11;14(6):1274. doi: 10.3390/v14061274. Viruses. 2022. PMID: 35746744 Free PMC article.
-
COWID: an efficient cloud-based genomics workflow for scalable identification of SARS-COV-2.Brief Bioinform. 2023 Sep 20;24(5):bbad280. doi: 10.1093/bib/bbad280. Brief Bioinform. 2023. PMID: 37738400 Free PMC article.
-
Petascale Homology Search for Structure Prediction.bioRxiv [Preprint]. 2023 Jul 11:2023.07.10.548308. doi: 10.1101/2023.07.10.548308. bioRxiv. 2023. Update in: Cold Spring Harb Perspect Biol. 2024 May 2;16(5):a041465. doi: 10.1101/cshperspect.a041465 PMID: 37503235 Free PMC article. Updated. Preprint.
-
Slippery when wet: cross-species transmission of divergent coronaviruses in bony and jawless fish and the evolutionary history of the Coronaviridae.Virus Evol. 2021 May 31;7(2):veab050. doi: 10.1093/ve/veab050. eCollection 2021. Virus Evol. 2021. PMID: 34527280 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
