Identifying essential genes in bacterial metabolic networks with machine learning methods

BMC Syst Biol. 2010 May 3:4:56. doi: 10.1186/1752-0509-4-56.

Abstract

Background: Identifying essential genes in bacteria supports to identify potential drug targets and an understanding of minimal requirements for a synthetic cell. However, experimentally assaying the essentiality of their coding genes is resource intensive and not feasible for all bacterial organisms, in particular if they are infective.

Results: We developed a machine learning technique to identify essential genes using the experimental data of genome-wide knock-out screens from one bacterial organism to infer essential genes of another related bacterial organism. We used a broad variety of topological features, sequence characteristics and co-expression properties potentially associated with essentiality, such as flux deviations, centrality, codon frequencies of the sequences, co-regulation and phyletic retention. An organism-wise cross-validation on bacterial species yielded reliable results with good accuracies (area under the receiver-operator-curve of 75% - 81%). Finally, it was applied to drug target predictions for Salmonella typhimurium. We compared our predictions to the viability of experimental knock-outs of S. typhimurium and identified 35 enzymes, which are highly relevant to be considered as potential drug targets. Specifically, we detected promising drug targets in the non-mevalonate pathway.

Conclusions: Using elaborated features characterizing network topology, sequence information and microarray data enables to predict essential genes from a bacterial reference organism to a related query organism without any knowledge about the essentiality of genes of the query organism. In general, such a method is beneficial for inferring drug targets when experimental data about genome-wide knockout screens is not available for the investigated organism.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Escherichia coli / metabolism
  • Fatty Acids / metabolism
  • Gene Expression Profiling
  • Gene Expression Regulation, Bacterial
  • Genes, Bacterial
  • Genes, Essential*
  • Genome, Bacterial
  • Genomics
  • Metabolic Networks and Pathways / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Pseudomonas aeruginosa / metabolism
  • ROC Curve
  • Salmonella typhimurium / genetics
  • Systems Biology

Substances

  • Fatty Acids