The embedding problem for markov models of nucleotide substitution

PLoS One. 2013 Jul 30;8(7):e69187. doi: 10.1371/journal.pone.0069187. Print 2013.

Abstract

Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Evolution, Molecular*
  • Humans
  • Introns
  • Markov Chains*
  • Mice
  • Models, Genetic*
  • Mutation*
  • Nucleotides / genetics
  • Open Reading Frames / genetics
  • Phylogeny
  • Rats

Substances

  • Nucleotides

Grant support

Research was funded by an ARC grant (Australian Research Council – http://www.arc.gov.au/) awarded to GAH and VBY. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.