Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Science. 2022 Jun 17;376(6599):1327-1332. doi: 10.1126/science.abm1208. Epub 2022 May 24.


Repeated emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many nonspike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • COVID-19* / virology
  • Genetic Fitness*
  • Genome, Viral
  • Humans
  • Mutation
  • Regression Analysis
  • SARS-CoV-2* / genetics
  • Spike Glycoprotein, Coronavirus / chemistry
  • Spike Glycoprotein, Coronavirus / genetics


  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2