Genome-wide analysis of wild-type Epstein-Barr virus genomes derived from healthy individuals of the 1,000 Genomes Project

Genome Biol Evol. 2014 Apr;6(4):846-60. doi: 10.1093/gbe/evu054.


Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.

Keywords: EBV; Illumina reads; human herpesvirus 4; recombination; selection; whole-genome analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Chromosome Mapping
  • Evolution, Molecular*
  • Female
  • Genome, Viral*
  • Genome-Wide Association Study*
  • Herpesvirus 4, Human / physiology*
  • Host-Pathogen Interactions / physiology*
  • Humans
  • Male
  • Molecular Sequence Data
  • Viral Proteins / genetics*
  • Virus Latency / genetics*


  • Viral Proteins

Associated data

  • GENBANK/KF602052
  • GENBANK/KF602053
  • GENBANK/KF602054
  • GENBANK/KF602055
  • GENBANK/KF602056
  • GENBANK/KF602057
  • GENBANK/KF602058
  • GENBANK/KF602059
  • GENBANK/KF602060