Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug;2(8):1804-20.
doi: 10.3390/v2081803. Epub 2010 Aug 24.

Coronavirus Genomics and Bioinformatics Analysis

Free PMC article

Coronavirus Genomics and Bioinformatics Analysis

Patrick C Y Woo et al. Viruses. .
Free PMC article


The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb) among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid) and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4×10(-4) to 2×10(-2) substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV), between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV) type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1). Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

Keywords: bioinformatics; coronavirus; genome.


Figure 1.
Figure 1.
Genome organizations of members in different genera of the Coronaviridae family. PL1, papain-like protease 1; PL2, papain-like protease 2; PL, papain-like protease; 3CL, chymotrypsin-like protease; Pol, RNA-dependent RNA polymerase; Hel, helicase; HE, haemagglutinin esterase; S, spike; E, envelope; M, membrane; N, nucleocapsid. TGEV, porcine transmissible gastroenteritis virus (NC_002306); PRCV, porcine respiratory coronavirus (DQ811787); FCoV, feline coronavirus (NC_012937); HCoV-229E, human coronavirus 229E (NC_002645); HCoV-NL63, human coronavirus NL63 (NC_005831); PEDV, porcine epidemic diarrhea virus (NC_003436); Sc-BatCoV 512, Scotophilus bat coronavirus 512 (NC_009657); Rh-BatCoV-HKU2, Rhinolophus bat coronavirus HKU2 (NC_009988); Mi-BatCoV-HKU8, Miniopterus bat coronavirus HKU8 (NC_010438); Mi-BatCoV 1A, Miniopterus bat coronavirus 1A (NC_010437); Mi-BatCoV 1B, Miniopterus bat coronavirus 1B (NC_010436); HCoV-OC43, human coronavirus OC43 (NC_005147); BCoV, bovine coronavirus (NC_003045); PHEV, porcine hemagglutinating encephalomyelitis virus (NC_007732); HCoV-HKU1, human coronavirus HKU1 (NC_006577); MHV, mouse hepatitis virus (NC_006852); ECoV, equine coronavirus (NC_010327); SARSr-CoV, human SARS related coronavirus (NC_004718); SARSr-Rh-BatCoV HKU3, SARS-related Rhinolophus bat coronavirus HKU3 (NC_009694); Ty-BatCoV-HKU4, Tylonycteris bat coronavirus HKU4 (NC_009019); Pi-BatCoV-HKU5, Pipistrellus bat coronavirus HKU5 (NC_009020); Ro-BatCoV-HKU9, Rousettus bat coronavirus HKU9 (NC_009021); IBV, infectious bronchitis virus (NC_001451); TCoV, turkey coronavirus (NC_010800); SW1, beluga whale coronavirus (NC_010646); BuCoV HKU11, bulbul coronavirus HKU11 (FJ376620); ThCoV HKU12, thrush coronavirus HKU12 (NC_011549); MunCoV HKU13, munia coronavirus HKU13 (NC_011550).
Figure 2.
Figure 2.
Phylogenetic analysis of RNA-dependent RNA polymerases (Pol) of coronaviruses with complete genome sequences available. The tree was constructed by the neighbor-joining method and rooted using Breda virus polyprotein (YP_337905). Bootstrap values were calculated from 1000 trees. 1118 amino acid positions in Pol were included. The scale bar indicates the estimated number of substitutions per 20 amino acids. All abbreviations for the coronaviruses were the same as those in Figure 1.

Similar articles

See all similar articles

Cited by 88 articles

See all "Cited by" articles


    1. Snijder EJ, Bredenbeek PJ, Dobbe JC, Thiel V, Ziebuhr J, Poon LL, Guan Y, Rozanov M, Spaan WJ, Gorbalenya AE. Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage. J Mol Biol. 2003;331:991–1004. - PMC - PubMed
    1. Woo PC, Lau SK, Lam CS, Lai KK, Huang Y, Lee P, Luk GS, Dyrting KC, Chan KH, Yuen KY. Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus. J Virol. 2009;83:908–917. - PMC - PubMed
    1. Woo PC, Wang M, Lau SK, Xu H, Poon RW, Guo R, Wong BH, Gao K, Tsoi HW, Huang Y, Li KS, Lam CS, Chan KH, Zheng BJ, Yuen KY. Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features. J Virol. 2007;81:1574–1585. - PMC - PubMed
    1. ICTV Virus Taxonomy: 2009 Release. Available online: (accessed on 1 August 2010)
    1. Liu S, Chen J, Chen J, Kong X, Shao Y, Han Z, Feng L, Cai X, Gu S, Liu M. Isolation of avian infectious bronchitis coronavirus from domestic peafowl (Pavo cristatus) and teal (Anas) J Gen Virol. 2005;86:719–725. - PubMed