Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 30;7(1):veaa098.
doi: 10.1093/ve/veaa098. eCollection 2021 Jan.

Synonymous mutations and the molecular evolution of SARS-CoV-2 origins

Affiliations

Synonymous mutations and the molecular evolution of SARS-CoV-2 origins

Hongru Wang et al. Virus Evol. .

Abstract

Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN /dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11-75.31) and 37.02 years (95% CI, 18.19-55.85), respectively.

Keywords: SARS-CoV-2; molecular evolution; synonymous mutations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Genome-wide identity plot and top blast hits for SARS-CoV-2, RaTG13, and RmYN02. (a) 300 bp sliding windows of nucleotide identity between SARS-CoV-2 and the four most closely related viral strains, RmYN02, RaTG13, GD410721, and GX_P1E. Orange shading marks the recombinant region in SARS-CoV-2 inferred by 3SEQ (details in Supplementary Table S5). (b) The plot lists all the viral strains that are the unique best BLAST hit in at least three 100-bp windows, when blasting with SARS-CoV-2, with the regions where each strain is the top blast hit marked. (b) and (c). Figures for RaTG13 (c, d) and RmYN02 (e, f) generated in the same way as for SARS-CoV-2 in (a) and (b). The ACE2 contact residues of RBD region (left) and the furin sites (right) of the S protein are marked in both plots with gray lines.
Figure 2.
Figure 2.
Unrooted phylogenies of the virus strains. (a) ML tree in genomic regions with recombination tracts removed. (b) Neighbor-joining tree using synonymous mutation (dS) distance in genomic regions with recombination tracts removed. (c) Neighbor-joining tree using nonsynonymous mutation (dN) distances in genomic regions with recombination tracts removed. (d) The MLs tree at the RBD ACE2 contact residues (51 amino acids) region. The bootstrap values are based on 1,000 replicates. The associated distance matrix for (b) and (c) can be found in Table 3.
Figure 3.
Figure 3.
Bias correction for dS estimates in 300-bp windows. (a) The mean of dS estimates using different methods; ML.corr and yn00.corr are the bias-corrected versions of the ML and yn00 methods, respectively. (b) Errors in dS estimates as measured using the ratio of square root of MSE to true dS. All the estimates are based on 10,000 simulations. ML: maximum-likelihood estimates using the f3x4 model in codeml; ML.corr, maximum-likelihood estimates with bias correction; yn00, count-based estimates in Yang and Nielsen (2000); yn00.corr, yn00 estimates with bias correction. All dS estimates are truncated at 3, explaining the reduction in MSE with increasing values of dS as dS approaches 3.
Figure 4.
Figure 4.
dS and dN estimates across the virus genome. (a) Pairwise dS estimates in 300-bp sliding windows for RaTG13, GD410721, and Wuhan-Hu-1, the estimates are truncated at 4. (b) dS ratio of dS (Wuhan-Hu-1, RaTG13) to dS (Wuhan-Hu-1, GD410721). (c) and (d) are the zoom-in plot for dS and dS ratio at the spike (S) protein region. The RBD contact residues (left) and furin site regions (right) are marked with gray lines. (e) The pairwise dN estimates in 300-bp sliding windows in the S protein for these strains. The dS values are truncated at 4 in the plots. The pairwise estimates were calculated on the alignment of the three sequences.

Similar articles

Cited by

References

    1. Altschul S. F. et al. (1990) ‘ Basic Local Alignment Search Tool’, Journal of Molecular Biology, 215: 403–10. - PubMed
    1. Boni M. F. et al. (2020) ‘ Evolutionary Origins of the SARS-CoV-2 Sarbecovirus Lineage Responsible for the COVID-19 Pandemic’, Nature Microbiology, 5: 1408–17. - PubMed
    1. Boni M. F., Posada D., Feldman M. W. (2007) ‘ An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets’, Genetics, 176: 1035–47. - PMC - PubMed
    1. Chamary J. V., Parmley J. L., Hurst L. D. (2006) ‘ Hearing Silence: Non-Neutral Evolution at Synonymous Sites in Mammals’, Nature Reviews Genetics, 7: 98–108. - PubMed
    1. Drummond A. J. et al. (2006) ‘ Relaxed Phylogenetics and Dating with Confidence’, PLoS Biology, 4: e88. - PMC - PubMed