Hypervariable loci in the human gut virome

Proc Natl Acad Sci U S A. 2012 Mar 6;109(10):3962-6. doi: 10.1073/pnas.1119061109. Epub 2012 Feb 21.


Genetic variation is critical in microbial immune evasion and drug resistance, but variation has rarely been studied in complex heterogeneous communities such as the human microbiome. To begin to study natural variation, we analyzed DNA viruses present in the lower gastrointestinal tract of 12 human volunteers by determining 48 billion bases of viral DNA sequence. Viral genomes mostly showed low variation, but 51 loci of ∼100 bp showed extremely high variation, so that up to 96% of the viral genomes encoded unique amino acid sequences. Some hotspots of hypervariation were in genes homologous to the bacteriophage BPP-1 viral tail-fiber gene, which is known to be hypermutagenized by a unique reverse-transcriptase (RT)-based mechanism. Unexpectedly, other hypervariable loci in our data were in previously undescribed gene types, including genes encoding predicted Ig-superfamily proteins. Most of the hypervariable loci were linked to genes encoding RTs of a single clade, which we find is the most abundant clade among gut viruses but only a minor component of bacterial RT populations. Hypervariation was targeted to 5'-AAY-3' asparagine codons, which allows maximal chemical diversification of the encoded amino acids while avoiding formation of stop codons. These findings document widespread targeted hypervariation in the human gut virome, identify previously undescribed types of genes targeted for hypervariation, clarify association with RT gene clades, and motivate studies of hypervariation in the full human microbiome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Codon
  • Contig Mapping
  • Gastrointestinal Tract / microbiology
  • Gastrointestinal Tract / virology*
  • Genetic Variation*
  • Genome, Viral*
  • Humans
  • Metagenome
  • Models, Genetic
  • Molecular Sequence Data
  • Mutagenesis
  • Open Reading Frames
  • RNA, Ribosomal, 16S / metabolism
  • Sequence Analysis, DNA


  • Codon
  • RNA, Ribosomal, 16S