Defining Functional Genic Regions in the Human Genome through Integration of Biochemical, Evolutionary, and Genetic Evidence

Mol Biol Evol. 2017 Jul 1;34(7):1788-1798. doi: 10.1093/molbev/msx101.


The human genome is dominated by large tracts of DNA with extensive biochemical activity but no known function. In particular, it is well established that transcriptional activities are not restricted to known genes. However, whether this intergenic transcription represents activity with functional significance or noise is under debate, highlighting the need for an effective method of defining functional genomic regions. Moreover, these discoveries raise the question whether genomic regions can be defined as functional based solely on the presence of biochemical activities, without considering evolutionary (conservation) and genetic (effects of mutations) evidence. Here, computational models integrating genetic, evolutionary, and biochemical evidence are established that provide reliable predictions of human protein-coding and RNA genes. Importantly, in addition to sequence conservation, biochemical features allow accurate predictions of genic sequences with phenotypic evidence under strong purifying selection, suggesting that they can be used as an alternative measure of selection. Moreover, 18.5% of annotated noncoding RNAs exhibit higher degrees of similarity to phenotype genes and, thus, are likely functional. However, 64.5% of noncoding RNAs appear to belong to a sequence class of their own, and the remaining 17% are more similar to pseudogenes and random intergenic sequences that may represent noisy transcription.

Keywords: chromatin state; conservation; functional genomic region; random forest classification.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Biological Evolution
  • Computational Biology / methods*
  • Computer Simulation
  • Conserved Sequence / genetics
  • DNA, Intergenic / genetics*
  • Evolution, Molecular
  • Genome, Human
  • Genomics / methods
  • Humans
  • Pseudogenes / genetics
  • RNA
  • RNA, Untranslated
  • Selection, Genetic
  • Sequence Analysis, DNA / methods*
  • Transcription, Genetic


  • DNA, Intergenic
  • RNA, Untranslated
  • RNA