Predicting the ancestral character changes in a tree is typically easier than predicting the root state

Olivier Gascuel; Mike Steel

doi:10.1093/sysbio/syu010

Predicting the ancestral character changes in a tree is typically easier than predicting the root state

Syst Biol. 2014 May;63(3):421-35. doi: 10.1093/sysbio/syu010. Epub 2014 Feb 21.

Authors

Olivier Gascuel¹, Mike Steel

Affiliation

¹ Institut de Biologie Computationnelle, LIRMM, UMR 5506 CNRS - Univ. Montpellier 2, Case courrier 06011, 95 rue de la Galéra, 34095 Montpellier, France; Allan Wilson Centre, University of Canterbury, Ilam Road 8041, Christchurch, New Zealand.

PMID: 24562915
DOI: 10.1093/sysbio/syu010

Abstract

Predicting the ancestral sequences of a group of homologous sequences related by a phylogenetic tree has been the subject of many studies, and numerous methods have been proposed for this purpose. Theoretical results are available that show that when the substitution rates become too large, reconstructing the ancestral state at the tree root is no longer feasible. Here, we also study the reconstruction of the ancestral changes that occurred along the tree edges. We show that, that, depending on the tree and branch length distribution, reconstructing these changes (i.e., reconstructing the ancestral state of all internal nodes in the tree) may be easier or harder than reconstructing the ancestral root state. However, results from information theory indicate that for the standard Yule tree, the task of reconstructing internal node states remains feasible, even for very high substitution rates. Moreover, computer simulations demonstrate that for more complex trees and scenarios, this result still holds. For a large variety of counting, parsimony- and likelihood-based methods, the predictive accuracy of a randomly selected internal node in the tree is indeed much higher than the accuracy of the same method when applied to the tree root. Moreover, parsimony- and likelihood-based methods appear to be remarkably robust to sampling bias and model mis-specification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Classification*
Computer Simulation
Likelihood Functions
Models, Theoretical*
Phylogeny*
Probability