The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S23. doi: 10.1186/1471-2164-11-S4-S23.

Abstract

Background: Human disease genes can be distinguished from essential (embryonically lethal) and non-disease genes using gene attributes. Such attributes include gene age, tissue specificity of expression, regulatory capacity, sequence length, rate of sequence variation and capacity for interaction. The resulting information has been used to inform data mining approaches seeking to identify novel disease genes. Given the dynamic nature of this field and the rapid rise in relevant information, we have chosen to perform a single integrated mining approach to explore relationships among gene attributes and thereby characterise evolutionary trends associated with disease genes.

Results: All against all cross comparison of 2,522 disease gene attributes revealed significant relationships existed between the age, disease-association and expression pattern of genes and the tissues within which they are expressed. We found that the over-representation of disease genes among old genes holds for tissue-specific genes, but the correlation between age and disease association vanished when conditioning on tissue-specificity. Of the 32 tissues studied, the genes expressed in pancreas are on average older than the genes expressed in any other tissue, while the testis expressed the lowest proportion of old genes. Following a focussed analysis on the impact of regulatory apparatus on evolution of disease genes, we show that regulators, comprising transcription factors and post-translation modified proteins, are over-represented among ancient disease genes. In addition, we show that the proportion of regulator genes is affected by gene age among disease genes and by tissue-specificity among non-disease genes. Finally, using 55,606 true positive gene interaction data, we find that old disease genes interacts with other old disease genes and interacting new genes interacts with genes originating from higher phylostrata.

Conclusion: This study supports the non-random nature of the human diseasome. We have identified a variety of distinct features and correlations to other molecular attributes that can be used to distinguish the set of disease causing genes. This was achieved by harnessing the power of mining large scale datasets from OMIM and other databases. Ultimately such knowledge may contribute to the identification of novel human disease genes and an enhanced understanding of human biology.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Data Mining
  • Databases, Genetic*
  • Evolution, Molecular*
  • Gene Expression
  • Genetic Diseases, Inborn / genetics*
  • Genetic Variation
  • Humans
  • Male
  • Organ Specificity / genetics*
  • Protein Processing, Post-Translational / genetics
  • Systems Biology
  • Transcription Factors / genetics

Substances

  • Transcription Factors