CovidPhy: A tool for phylogeographic analysis of SARS-CoV-2 variation

Environ Res. 2022 Mar;204(Pt A):111909. doi: 10.1016/j.envres.2021.111909. Epub 2021 Aug 20.


The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the pathogen responsible for the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 genomes have been sequenced massively and worldwide and are now available in different public genome repositories. There is much interest in generating bioinformatic tools capable to analyze and interpret SARS-CoV-2 variation. We have designed CovidPhy (, a web interface that can process SARS-CoV-2 genome sequences in plain fasta text format or provided through identity codes from the Global Initiative on Sharing Avian Influenza Data (GISAID) or GenBank. CovidPhy aggregates information available on the large GISAID database (>1.49 M genomes). Sequences are first aligned against the reference sequence and the interface provides different sources of information, including automatic classification of genomes into a pre-computed phylogeny and phylogeographic information, haplogroup/lineage frequencies, and sequencing variation, indicating also if the genome contains known variants of concern (VOC). Additionally, CovidPhy allows searching for variants and haplotypes introduced by the user and includes a list of genomes that are good candidates for being responsible for large outbreaks worldwide, most likely mediated by important superspreading events, indicating their possible geographic epicenters and their relative impact as recorded in the GISAID database.

Keywords: COVID-19; Phylogeny; RNA; SARS-CoV-2; Superspreading events; Variants of concern.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / virology
  • Databases, Genetic
  • Genome, Viral*
  • Humans
  • Internet
  • Pandemics
  • Phylogeny*
  • Phylogeography
  • SARS-CoV-2* / genetics
  • Software