From SARS and MERS CoVs to SARS-CoV-2: Moving toward more biased codon usage in viral structural and nonstructural genes

J Med Virol. 2020 Jun;92(6):660-666. doi: 10.1002/jmv.25754. Epub 2020 Mar 16.

Abstract

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an emerging disease with fatal outcomes. In this study, a fundamental knowledge gap question is to be resolved by evaluating the differences in biological and pathogenic aspects of SARS-CoV-2 and the changes in SARS-CoV-2 in comparison with the two prior major COV epidemics, SARS and Middle East respiratory syndrome (MERS) coronaviruses.

Methods: The genome composition, nucleotide analysis, codon usage indices, relative synonymous codons usage, and effective number of codons (ENc) were analyzed in the four structural genes; Spike (S), Envelope (E), membrane (M), and Nucleocapsid (N) genes, and two of the most important nonstructural genes comprising RNA-dependent RNA polymerase and main protease (Mpro) of SARS-CoV-2, Beta-CoV from pangolins, bat SARS, MERS, and SARS CoVs.

Results: SARS-CoV-2 prefers pyrimidine rich codons to purines. Most high-frequency codons were ending with A or T, while the low frequency and rare codons were ending with G or C. SARS-CoV-2 structural proteins showed 5 to 20 lower ENc values, compared with SARS, bat SARS, and MERS CoVs. This implies higher codon bias and higher gene expression efficiency of SARS-CoV-2 structural proteins. SARS-CoV-2 encoded the highest number of over-biased and negatively biased codons. Pangolin Beta-CoV showed little differences with SARS-CoV-2 ENc values, compared with SARS, bat SARS, and MERS CoV.

Conclusion: Extreme bias and lower ENc values of SARS-CoV-2, especially in Spike, Envelope, and Mpro genes, are suggestive for higher gene expression efficiency, compared with SARS, bat SARS, and MERS CoVs.

Keywords: COVID-19; MERS CoV; SARS-CoV-2; codon bias; nonstructural protein; preferred codons.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Betacoronavirus / classification
  • Betacoronavirus / genetics*
  • Betacoronavirus / pathogenicity
  • COVID-19
  • Chiroptera / microbiology
  • Codon Usage
  • Computational Biology
  • Coronavirus 3C Proteases
  • Coronavirus Envelope Proteins
  • Coronavirus Infections / epidemiology
  • Coronavirus Infections / transmission
  • Coronavirus Infections / virology
  • Coronavirus Nucleocapsid Proteins
  • Cysteine Endopeptidases / genetics*
  • Cysteine Endopeptidases / metabolism
  • Eutheria / microbiology
  • Gene Expression
  • Humans
  • Middle East Respiratory Syndrome Coronavirus / classification
  • Middle East Respiratory Syndrome Coronavirus / genetics*
  • Middle East Respiratory Syndrome Coronavirus / pathogenicity
  • Nucleocapsid Proteins / genetics*
  • Nucleocapsid Proteins / metabolism
  • Pandemics
  • Phosphoproteins
  • Pneumonia, Viral / epidemiology
  • Pneumonia, Viral / transmission
  • Pneumonia, Viral / virology
  • RNA-Dependent RNA Polymerase / genetics*
  • RNA-Dependent RNA Polymerase / metabolism
  • SARS-CoV-2
  • Sequence Homology, Nucleic Acid
  • Severe Acute Respiratory Syndrome / epidemiology
  • Severe Acute Respiratory Syndrome / transmission
  • Severe Acute Respiratory Syndrome / virology
  • Severe acute respiratory syndrome-related coronavirus / classification
  • Severe acute respiratory syndrome-related coronavirus / genetics*
  • Severe acute respiratory syndrome-related coronavirus / pathogenicity
  • Spike Glycoprotein, Coronavirus / genetics*
  • Spike Glycoprotein, Coronavirus / metabolism
  • Viral Envelope Proteins / genetics*
  • Viral Envelope Proteins / metabolism
  • Viral Nonstructural Proteins / genetics*
  • Viral Nonstructural Proteins / metabolism

Substances

  • Coronavirus Envelope Proteins
  • Coronavirus Nucleocapsid Proteins
  • Nucleocapsid Proteins
  • Phosphoproteins
  • Spike Glycoprotein, Coronavirus
  • Viral Envelope Proteins
  • Viral Nonstructural Proteins
  • envelope protein, SARS-CoV-2
  • nucleocapsid phosphoprotein, SARS-CoV-2
  • spike protein, SARS-CoV-2
  • RNA-Dependent RNA Polymerase
  • Cysteine Endopeptidases
  • Coronavirus 3C Proteases