Exact goodness-of-fit tests for Markov chains

Biometrics. 2013 Jun;69(2):488-96. doi: 10.1111/biom.12009. Epub 2013 Feb 21.


Goodness-of-fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ² asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness-of-fit tests for first- and higher order Markov chains, with particular attention given to time-reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948-1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four-state DNA sequence and lends support to the original conclusion that a second-order Markov chain provides an adequate fit to the data. The last one is six-state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first-order reversible Markov chain at 6 picosecond time steps.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Biometry / methods*
  • Computer Simulation
  • Dipeptides / chemistry
  • Humans
  • Markov Chains*
  • Models, Statistical*
  • Molecular Dynamics Simulation / statistics & numerical data
  • Monte Carlo Method
  • Rain
  • Stochastic Processes
  • Washington


  • Dipeptides
  • alanylalanine