Probabilistic inference of molecular networks from noisy data sources

Bioinformatics. 2004 May 22;20(8):1205-13. doi: 10.1093/bioinformatics/bth061. Epub 2004 Feb 10.


Information on molecular networks, such as networks of interacting proteins, comes from diverse sources that contain remarkable differences in distribution and quantity of errors. Here, we introduce a probabilistic model useful for predicting protein interactions from heterogeneous data sources. The model describes stochastic generation of protein-protein interaction networks with real-world properties, as well as generation of two heterogeneous sources of protein-interaction information: research results automatically extracted from the literature and yeast two-hybrid experiments. Based on the domain composition of proteins, we use the model to predict protein interactions for pairs of proteins for which no experimental data are available. We further explore the prediction limits, given experimental data that cover only part of the underlying protein networks. This approach can be extended naturally to include other types of biological data sources.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Cell Physiological Phenomena*
  • Database Management Systems
  • Databases, Bibliographic*
  • Databases, Protein*
  • Information Storage and Retrieval / methods*
  • Models, Biological*
  • Models, Statistical
  • Periodicals as Topic
  • Protein Interaction Mapping / methods*
  • Sequence Analysis, Protein / methods
  • Signal Transduction / physiology*
  • Stochastic Processes
  • Two-Hybrid System Techniques
  • Yeasts / metabolism