Background: We investigated the genome diversity of SARS-CoV-2 associated with the early COVID-19 period to investigate evolution of the virus in Pakistan.
Materials and methods: We studied ninety SARS-CoV-2 strains isolated between March and October 2020. Whole genome sequences from our laboratory and available genomes were used to investigate phylogeny, genetic variantion and mutation rates of SARS-CoV-2 strains in Pakistan. Site specific entropy analysis compared mutation rates between strains isolated before and after June 2020.
Results: In March, strains belonging to L, S, V and GH clades were observed but by October, only L and GH strains were present. The highest diversity of clades was present in Sindh and Islamabad Capital Territory and the least in Punjab province. Initial introductions of SARS-CoV-2 GH (B.1.255, B.1) and S (A) clades were associated with overseas travelers. Additionally, GH (B.1.255, B.1, B.1.160, B.1.36), L (B, B.6, B.4), V (B.4) and S (A) clades were transmitted locally. SARS-CoV-2 genomes clustered with global strains except for ten which matched Pakistani isolates. RNA substitution rates were estimated at 5.86 x10-4. The most frequent mutations were 5' UTR 241C > T, Spike glycoprotein D614G, RNA dependent RNA polymerase (RdRp) P4715L and Orf3a Q57H. Strains up until June 2020 exhibited an overall higher mean and site-specific entropy as compared with sequences after June. Relative entropy was higher across GH as compared with GR and L clades. More sites were under selection pressure in GH strains but this was not significant for any particular site.
Conclusions: The higher entropy and diversity observed in early pandemic as compared with later strains suggests increasing stability of the genomes in subsequent COVID-19 waves. This would likely lead to the selection of site-specific changes that are advantageous to the virus, as has been currently observed through the pandemic.