An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes

Bioinformatics. 2005 May 1;21(9):1853-8. doi: 10.1093/bioinformatics/bti303. Epub 2005 Feb 2.


Motivation: Knowledge of the transmembrane helical topology can help identify binding sites and infer functions for membrane proteins. However, because membrane proteins are hard to solubilize and purify, only a very small amount of membrane proteins have structure and topology experimentally determined. This has motivated various computational methods for predicting the topology of membrane proteins.

Results: We present an improved hidden Markov model, TMMOD, for the identification and topology prediction of transmembrane proteins. Our model uses TMHMM as a prototype, but differs from TMHMM by the architecture of the submodels for loops on both sides of the membrane and also by the model training procedure. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD had 84% for topology and 89% for locations. When utilized for identifying transmembrane proteins from non-transmembrane proteins, particularly signal peptides, TMMOD has consistently fewer false positives than TMHMM does. Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for approximately 20-30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans.


Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Artificial Intelligence*
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Markov Chains
  • Membrane Proteins / analysis
  • Membrane Proteins / chemistry*
  • Membrane Proteins / genetics*
  • Models, Chemical*
  • Models, Molecular*
  • Models, Statistical
  • Molecular Sequence Data
  • Protein Conformation
  • Sequence Homology, Amino Acid
  • Software


  • Membrane Proteins