Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 6;46(3):fuac003.
doi: 10.1093/femsre/fuac003.

The first three waves of the Covid-19 pandemic hint at a limited genetic repertoire for SARS-CoV-2

Affiliations

The first three waves of the Covid-19 pandemic hint at a limited genetic repertoire for SARS-CoV-2

Trudy M Wassenaar et al. FEMS Microbiol Rev. .

Abstract

The genomic diversity of SARS-CoV-2 is the result of a relatively low level of spontaneous mutations introduced during viral replication. With millions of SARS-CoV-2 genome sequences now available, we can begin to assess the overall genetic repertoire of this virus. We find that during 2020, there was a global wave of one variant that went largely unnoticed, possibly because its members were divided over several sublineages (B.1.177 and sublineages B.1.177.XX). We collectively call this Janus, and it was eventually replaced by the Alpha (B.1.1.7) variant of concern (VoC), next replaced by Delta (B.1.617.2), which itself might soon be replaced by a fourth pandemic wave consisting of Omicron (B.1.1.529). We observe that splitting up and redefining variant lineages over time, as was the case with Janus and is now happening with Alpha, Delta and Omicron, is not helpful to describe the epidemic waves spreading globally. Only ∼5% of the 30 000 nucleotides of the SARS-CoV-2 genome are found to be variable. We conclude that a fourth wave of the pandemic with the Omicron variant might not be that different from other VoCs, and that we may already have the tools in hand to effectively deal with this new VoC.

Keywords: Omicron; Pango lineages; SARS-CoV-2; genetic repertoire; homoplasies; mutation frequency; recombination; variants of concern (VoCs).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Mutation density and genomic features of Sars-CoV-2. Positions of open reading frames are shown in panel (A). The frequencies of amino acid changes (B) and nucleotide mutations (C) are colored with light grey to dark green for low to high mutation frequencies, respectively. Panel (D) shows the percentage of local inverted repeats in the Wuhan-Hu-1 reference genome, which can be used as a proxy for regions likely to form stable stem-loop structures (Jensen, Friis and Ussery 1999), indicated by the dark blue peaks. Panel (E) shows the average AU content along the Wuhan-Hu-1 chromosome for a window of 100 nt. The average %AU is represented by a dashed line.
Figure 2.
Figure 2.
Matrix of all 1492 lineage-conserved (>95%) nucleotide mutations recorded in 410171 SARS-CoV-2 genomes that are conserved in at least one Pango lineage at >95%, reporting their presence in any other lineages at >0.5%. The 688 lineages are clustered based on all recorded mutations, including synonymous substitutions and deletions. Panel (A) shows the first 900 variable positions covering orf1ab and orfb, and panel (B) the remaining 592. The lineage names listed to the right are not complete and are only indicative. The positions of B.1.1.7 and B.1.351 are indicated by black arrows (P.1 is positioned directly above B.1.1.7). Three groups of lineages with multiple sublineages that mostly contain identical conserved mutations are also indicated. Major mutations occurring in multiple lineages are indicated inside the matrix. Capitals are used for amino acid substitutions (numbered for the amino acid position in the spliced protein) and lower case letters for synonymous nucleotide changes, numbered for the position in the reference genome, NC_045512.2. A position that flips between two nucleotides at low frequencies in multiple lineages is indicated by an arrow at the bottom of panel (A). The black square with dotted lines inside panel (A) indicates the position of the zoom shown in Fig. 3. Two homoplastic mutations, Q57H and S194L, are indicated in panel (B) with light-blue boxes.
Figure 2.
Figure 2.
Matrix of all 1492 lineage-conserved (>95%) nucleotide mutations recorded in 410171 SARS-CoV-2 genomes that are conserved in at least one Pango lineage at >95%, reporting their presence in any other lineages at >0.5%. The 688 lineages are clustered based on all recorded mutations, including synonymous substitutions and deletions. Panel (A) shows the first 900 variable positions covering orf1ab and orfb, and panel (B) the remaining 592. The lineage names listed to the right are not complete and are only indicative. The positions of B.1.1.7 and B.1.351 are indicated by black arrows (P.1 is positioned directly above B.1.1.7). Three groups of lineages with multiple sublineages that mostly contain identical conserved mutations are also indicated. Major mutations occurring in multiple lineages are indicated inside the matrix. Capitals are used for amino acid substitutions (numbered for the amino acid position in the spliced protein) and lower case letters for synonymous nucleotide changes, numbered for the position in the reference genome, NC_045512.2. A position that flips between two nucleotides at low frequencies in multiple lineages is indicated by an arrow at the bottom of panel (A). The black square with dotted lines inside panel (A) indicates the position of the zoom shown in Fig. 3. Two homoplastic mutations, Q57H and S194L, are indicated in panel (B) with light-blue boxes.
Figure 3.
Figure 3.
Zoom of the matrix for a region in orf1ab. All mutations and lineages included in this zoom are now labeled and visible on the X andY axes, respectively. Two positions indicative of homoplasies are indicated by light-blue boxes. Mutation g11083t leading to NSP6:L37F varies between wild type and mutation at variable frequencies, from 0.5% to 100%. Mutation c1916t resulting in NSP7:S25L is conserved in a number of lineages and also found at low frequency in two unrelated lineages.
Figure 4.
Figure 4.
Matrix of mutation frequencies in the 44 Pango lineages that were represented by >1000 genomes in the GISAID database on 1 February 2021. The number of genomes for each of these is shown in the histogram, to the right of the matrix. The conservation of multiple mutations in various lineages, often at low frequency (red), is clearly visible.
Figure 5.
Figure 5.
Trends over time for the major VoCs and Janus. Panel (A) shows absolute numbers recorded per month, based on data downloaded from GISAID on 5 October 2021. Panel (B) shows the same data as fractions of the total, illustrating the waves of Janus, Alpha and Delta. The other VoCs were far less significant on a global scale, based on the submitted genome sequences. Dotted lines in panels (A) and (B) represent numbers for the strictly recorded Alpha and Delta variants, while solid lines represent numbers when variants of Q were added to Alpha and AY variants were added to Delta. Since sequencing and submitting genomes takes time, September 2021 is not included in the graphs. Panels (C) and (D) show the fractions of lineages belonging to Janus (B.1.177 and the various B.1.177.XX lineages), based on the datasets of 1 February and 21 May, respectively. Panel (E) shows how Janus is broken up in various sublineages, whose numbers vary over time, and how offspring of Alpha became variants of Q and that of Delta became multiple variants of AY. n.a.: not applicable.
Figure 6.
Figure 6.
Matrix showing mutations accumulating in >10% of the members of selected lineages, based on the dataset of 5 October 2021. Panel (A) shows the VoCs Alpha (with Q included), Beta, Gamma, Delta (with AY included) and the group of Janus. Mutations that were conserved in >80% of their members are shown in blue (note that this degree of conservation is more relaxed than >95% that was used for their LCS). The cladogram to the left shows that Delta is less related to the other four lineages. Panel (B) compares the mutation frequency of Delta (plus AY) with that of two related lineages that are not widely spreading. Mutations present in the LCS (at 95%) of Delta plus AY but absent in the other two are indicated by arrows, with grey for mutations that have been noted in earlier, unrelated Pango lineages. The black arrow points to N:D63G that arose in June 2020. Two mutations occurring at the same position are boxed.

Similar articles

Cited by

References

    1. Alouane T, Laamarti M, Essabbar Aet al. . Genomic diversity and hotspot mutations in 30,983 SARS-CoV-2 genomes: moving toward a universal vaccine for the “Confined virus”?. Pathogens. 2020;9:829. - PMC - PubMed
    1. Añez G, Grinev A, Chancey Cet al. . Evolutionary dynamics of West Nile virus in the United States, 1999–2011: phylogeny, selection pressure and evolutionary time-scale analysis. PLoS Negl Trop Dis. 2013;7:e2245. - PMC - PubMed
    1. Biswal JK, Ranjan R, Subramaniam Set al. . Genetic and antigenic variation of foot-and-mouth disease virus during persistent infection in naturally infected cattle and Asian buffalo in India. PLoS One. 2019;14:e0214832. - PMC - PubMed
    1. Boni MF, Lemey P, Jiang Xet al. . Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 2020;5:1408–17. - PubMed
    1. Candido DS, Claro IM, de Jesus JGet al. . Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science. 2020;369:1255–60. - PMC - PubMed

Publication types

Substances

Supplementary concepts