Bayesian restoration of a hidden Markov chain with applications to DNA sequencing

J Comput Biol. Summer 1999;6(2):261-77. doi: 10.1089/cmb.1999.6.261.

Abstract

Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Bayes Theorem*
  • Computer Simulation
  • Likelihood Functions
  • Logic
  • Markov Chains*
  • Monte Carlo Method
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Statistical Distributions