Exploiting CpG hypermutability to identify phenotypically significant variation within human protein-coding genes

Genome Biol Evol. 2011:3:938-49. doi: 10.1093/gbe/evr021. Epub 2011 Mar 11.

Abstract

The CpG dinucleotide is disproportionately represented in human genetic variation due to the hypermutability of 5-methyl-cytosine (5mC). We exploit this hypermutability and a novel codon substitution model to identify candidate functionally important exonic nucleotides. Population genetic theory suggests that codon positions with high cross-species CpG frequency will derive from stronger purifying selection. Using the phylogeny-based maximum likelihood inference framework, we applied codon substitution models with context-dependent parameters to measure the mutagenic and selective processes affecting CpG dinucleotides within exonic sequence. The suitability of these models was validated on >2,000 protein coding genes from a naturally occurring biological control, four yeast species that do not methylate their DNA. As expected, our analyses of yeast revealed no evidence for an elevated CpG transition rate or for substitution suppression affecting CpG-containing codons. Our analyses of >12,000 protein-coding genes from four primate lineages confirm the systemic influence of 5mC hypermutability on the divergence of these genes. After adjusting for confounding influences of mutation and the properties of the encoded amino acids, we confirmed that CpG-containing codons are under greater purifying selection in primates. Genes with significant evidence of enhanced suppression of nonsynonymous CpG changes were also shown to be significantly enriched in Online Mendelian Inheritance in Man. We developed a method for ranking candidate phenotypically influential CpG positions in human genes. Application of this method indicates that of the ∼1 million exonic CpG dinucleotides within humans, ∼20% are strong candidates for both hypermutability and disease association.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5-Methylcytosine / chemistry
  • Animals
  • Codon
  • CpG Islands / genetics*
  • DNA Methylation
  • Dinucleotide Repeats / genetics*
  • Disease / genetics
  • Evolution, Molecular
  • Exons / genetics
  • Genetic Association Studies
  • Humans
  • Models, Statistical
  • Mutagenesis*
  • Mutation
  • Open Reading Frames / genetics*
  • Phenotype
  • Phylogeny
  • Polymorphism, Single Nucleotide
  • Yeasts / genetics

Substances

  • Codon
  • 5-Methylcytosine