Structural phylogenetic analysis reveals lineage-specific RNA repetitive structural motifs in all coronaviruses and associated variations in SARS-CoV-2

Virus Evol. 2021 Jun 16;7(1):veab021. doi: 10.1093/ve/veab021. eCollection 2021 Jan.

Abstract

In many single-stranded (ss) RNA viruses, the cis-acting packaging signal that confers selectivity genome packaging usually encompasses short structured RNA repeats. These structural units, termed repetitive structural motifs (RSMs), potentially mediate capsid assembly by specific RNA-protein interactions. However, general knowledge of the conservation and/or the diversity of RSMs in the positive-sense ssRNA coronaviruses (CoVs) is limited. By performing structural phylogenetic analysis, we identified a variety of RSMs in nearly all CoV genomic RNAs, which are exclusively located in the 5'-untranslated regions (UTRs) and/or in the inter-domain regions of poly-protein 1ab coding sequences in a lineage-specific manner. In all alpha- and beta-CoVs, except for Embecovirus spp, two to four copies of 5'-gUUYCGUc-3' RSMs displaying conserved hexa-loop sequences were generally identified in Stem-loop 5 (SL5) located in the 5'-UTRs of genomic RNAs. In Embecovirus spp., however, two to eight copies of 5'-agc-3'/guAAu RSMs were found in the coding regions of non-structural protein (NSP) 3 and/or NSP15 in open reading frame (ORF) 1ab. In gamma- and delta-CoVs, other types of RSMs were found in several clustered structural elements in 5'-UTRs and/or ORF1ab. The identification of RSM-encompassing structural elements in all CoVs suggests that these RNA elements play fundamental roles in the life cycle of CoVs. In the recently emerged SARS-CoV-2, beta-CoV-specific RSMs are also found in its SL5, displaying two copies of 5'-gUUUCGUc-3' motifs. However, multiple sequence alignment reveals that the majority of SARS-CoV-2 possesses a variant RSM harboring SL5b C241U, and intriguingly, several variations in the coding sequences of viral proteins, such as Nsp12 P323L, S protein D614G, and N protein R203K-G204R, are concurrently found with such variant RSM. In conclusion, the comprehensive exploration for RSMs reveals phylogenetic insights into the RNA structural elements in CoVs as a whole and provides a new perspective on variations currently found in SARS-CoV-2.

Keywords: SARS-CoV-2; coronavirus; repetitive structural motifs; structural phylogenetics.