Querying pathways in protein interaction networks based on hidden Markov models

J Comput Biol. 2009 Feb;16(2):145-57. doi: 10.1089/cmb.2008.02TT.

Abstract

High-throughput techniques for measuring protein interactions have enabled the systematic study of complex protein networks. Comparing the networks of different organisms and identifying their common substructures can lead to a better understanding of the regulatory mechanisms underlying various cellular functions. To facilitate such comparisons, we present an efficient framework based on hidden Markov models (HMMs) that can be used for finding homologous pathways in a network of interest. Given a query path, our method identifies the top k matching paths in the network, which may contain any number of consecutive insertions and deletions. We demonstrate that our method is able to identify biologically significant pathways in protein interaction networks obtained from the DIP database, and the retrieved paths are closer to the curated pathways in the KEGG database when compared to the results from previous approaches. Unlike most existing algorithms that suffer from exponential time complexity, our algorithm has a polynomial complexity that grows linearly with the query size. This enables the search for very long paths with more than 10 proteins within a few minutes on a desktop computer. A software program implementing the algorithm is available upon request from the authors.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Animals
  • Computer Simulation
  • Databases, Factual
  • Drosophila melanogaster
  • Humans
  • Markov Chains*
  • Protein Interaction Mapping*
  • Proteins* / chemistry
  • Proteins* / metabolism
  • Reproducibility of Results
  • Signal Transduction*
  • Software

Substances

  • Proteins