Meta-MEME: motif-based hidden Markov models of protein families

Comput Appl Biosci. 1997 Aug;13(4):397-406. doi: 10.1093/bioinformatics/13.4.397.

Abstract

Motivation: Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately.

Results: This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alcohol Dehydrogenase / genetics
  • Algorithms
  • Amino Acid Sequence
  • Databases, Factual
  • Ferredoxins / genetics
  • Markov Chains*
  • Molecular Sequence Data
  • Proteins / genetics*
  • Sequence Alignment / methods
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Amino Acid
  • Software*
  • Stochastic Processes

Substances

  • Ferredoxins
  • Proteins
  • Alcohol Dehydrogenase