Using Dirichlet mixture priors to derive hidden Markov models for protein families

M Brown; R Hughey; A Krogh; I S Mian; K Sjölander; D Haussler

Using Dirichlet mixture priors to derive hidden Markov models for protein families

Proc Int Conf Intell Syst Mol Biol. 1993:1:47-55.

Authors

M Brown¹, R Hughey, A Krogh, I S Mian, K Sjölander, D Haussler

Affiliation

¹ University of California, Santa Cruz 95064, USA.

PMID: 7584370

Abstract

A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced. This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments. It is shown that this Bayesian method can improve the quality of HMMs produced from small training sets. Specific experiments on the EF-hand motif are reported, for which these priors are shown to produce HMMs with higher likelihood on unseen data, and fewer false positives and false negatives in a database search task.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Amino Acid Sequence*
Amino Acids / chemistry
Bayes Theorem
Databases, Factual
Markov Chains
Models, Molecular
Models, Statistical*
Protein Conformation
Proteins / chemistry
Proteins / classification*
Sequence Alignment / methods*

Substances

Amino Acids
Proteins

Grants and funding

GM17129/GM/NIGMS NIH HHS/United States