Predicting essential genes based on network and sequence analysis

Mol Biosyst. 2009 Dec;5(12):1672-8. doi: 10.1039/B900611G.

Abstract

Essential genes are indispensable to the viability of an organism. Identification and analysis of essential genes is key to understanding the systems level organization of living cells. On the other hand, the ability to predict these genes in pathogens is of great importance for directed drug development. Global analysis of protein interaction networks provides an effective way to elucidate the relationships between genes. It has been found that essential genes tend to be highly connected and generally have more interactions than nonessential ones. With recent large-scale identifications of essential genes and protein-protein interactions in Saccharomyces cerevisiae and Escherichia coli, we have systematically investigated the topological properties of essential and nonessential genes in the protein-protein interaction networks. Essential genes tend to play topologically more important roles in protein interaction networks. Many topological features were found to be statistically discriminative between essential and nonessential genes. In addition, we have also examined sequence properties such as open reading frame length, strand, and phyletic retention for their association with the gene essentiality. Employing the topological features in the protein interaction network and the sequence properties, we have built a machine learning classifier capable of predicting essential genes. Computational prediction of essential genes circumvents expensive and difficult experimental screens and will help antimicrobial drug development.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Escherichia coli / genetics
  • Gene Regulatory Networks*
  • Genes, Essential*
  • Genome, Bacterial / genetics
  • Genome, Fungal / genetics
  • Genomics / methods*
  • Models, Genetic
  • Models, Statistical*
  • ROC Curve
  • Saccharomyces cerevisiae / genetics
  • Sequence Analysis, DNA / methods*