A Bayesian approach to DNA sequence segmentation

Biometrics. 2004 Sep;60(3):573-81; discussion 581-8. doi: 10.1111/j.0006-341X.2004.00206.x.


Many deoxyribonucleic acid (DNA) sequences display compositional heterogeneity in the form of segments of similar structure. This article describes a Bayesian method that identifies such segments by using a Markov chain governed by a hidden Markov model. Markov chain Monte Carlo (MCMC) techniques are employed to compute all posterior quantities of interest and, in particular, allow inferences to be made regarding the number of segment types and the order of Markov dependence in the DNA sequence. The method is applied to the segmentation of the bacteriophage lambda genome, a common benchmark sequence used for the comparison of statistical segmentation algorithms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteriophage lambda / genetics
  • Bayes Theorem*
  • Biometry
  • DNA, Viral / genetics
  • Genome, Viral
  • Markov Chains
  • Models, Statistical
  • Monte Carlo Method
  • Sequence Analysis, DNA / statistics & numerical data*


  • DNA, Viral