Analysis of Emerging Variants in Structured Regions of the SARS-CoV-2 Genome

Evol Bioinform Online. 2021 May 5:17:11769343211014167. doi: 10.1177/11769343211014167. eCollection 2021.

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has motivated a widespread effort to understand its epidemiology and pathogenic mechanisms. Modern high-throughput sequencing technology has led to the deposition of vast numbers of SARS-CoV-2 genome sequences in curated repositories, which have been useful in mapping the spread of the virus around the globe. They also provide a unique opportunity to observe virus evolution in real time. Here, we evaluate two sets of SARS-CoV-2 genomic sequences to identify emerging variants within structured cis-regulatory elements of the SARS-CoV-2 genome. Overall, 20 variants are present at a minor allele frequency of at least 0.5%. Several enhance the stability of Stem Loop 1 in the 5' untranslated region (UTR), including a group of co-occurring variants that extend its length. One appears to modulate the stability of the frameshifting pseudoknot between ORF1a and ORF1b, and another perturbs a bi-ss molecular switch in the 3'UTR. Finally, 5 variants destabilize structured elements within the 3'UTR hypervariable region, including the S2M (stem loop 2 m) selfish genetic element, raising questions as to the functional relevance of these structures in viral replication. Two of the most abundant variants appear to be caused by RNA editing, suggesting host-viral defense contributes to SARS-CoV-2 genome heterogeneity. Our analysis has implications for the development of therapeutics that target viral cis-regulatory RNA structures or sequences.

Keywords: COVID-19; RNA structure; SARS; coronavirus; phylogeny.