The establishment of reference sequence for SARS-CoV-2 and variation analysis

J Med Virol. 2020 Jun;92(6):667-674. doi: 10.1002/jmv.25762. Epub 2020 Mar 20.


Starting around December 2019, an epidemic of pneumonia, which was named COVID-19 by the World Health Organization, broke out in Wuhan, China, and is spreading throughout the world. A new coronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the Coronavirus Study Group of the International Committee on Taxonomy of Viruses was soon found to be the cause. At present, the sensitivity of clinical nucleic acid detection is limited, and it is still unclear whether it is related to genetic variation. In this study, we retrieved 95 full-length genomic sequences of SARAS-CoV-2 strains from the National Center for Biotechnology Information and GISAID databases, established the reference sequence by conducting multiple sequence alignment and phylogenetic analyses, and analyzed sequence variations along the SARS-CoV-2 genome. The homology among all viral strains was generally high, among them, 99.99% (99.91%-100%) at the nucleotide level and 99.99% (99.79%-100%) at the amino acid level. Although overall variation in open-reading frame (ORF) regions is low, 13 variation sites in 1a, 1b, S, 3a, M, 8, and N regions were identified, among which positions nt28144 in ORF 8 and nt8782 in ORF 1a showed mutation rate of 30.53% (29/95) and 29.47% (28/95), respectively. These findings suggested that there may be selective mutations in SARS-COV-2, and it is necessary to avoid certain regions when designing primers and probes. Establishment of the reference sequence for SARS-CoV-2 could benefit not only biological study of this virus but also diagnosis, clinical monitoring and intervention of SARS-CoV-2 infection in the future.

Keywords: SARS-CoV-2; homology; nucleotide; reference sequence; variation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Betacoronavirus / classification
  • Betacoronavirus / genetics*
  • Betacoronavirus / isolation & purification
  • Betacoronavirus / pathogenicity
  • COVID-19
  • COVID-19 Testing
  • Clinical Laboratory Techniques / methods
  • Coronavirus Infections / diagnosis*
  • Coronavirus Infections / epidemiology*
  • Coronavirus Infections / transmission
  • Coronavirus Infections / virology
  • Databases, Genetic
  • Genome, Viral*
  • Humans
  • Mutation Rate*
  • Open Reading Frames
  • Pandemics*
  • Phylogeny
  • Pneumonia, Viral / diagnosis*
  • Pneumonia, Viral / epidemiology*
  • Pneumonia, Viral / transmission
  • Pneumonia, Viral / virology
  • RNA, Viral / genetics
  • Reference Standards
  • SARS-CoV-2
  • Sequence Alignment
  • Sequence Homology, Nucleic Acid


  • RNA, Viral