Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes

Mol Biol Evol. 2010 Jan;27(1):177-92. doi: 10.1093/molbev/msp219.


Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA, Intergenic
  • Evolution, Molecular*
  • Genome*
  • Genome, Human
  • Hominidae
  • Humans
  • Mice
  • Models, Genetic
  • Mutation*
  • Oligonucleotide Array Sequence Analysis
  • Open Reading Frames
  • Sequence Alignment


  • DNA, Intergenic