New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity

Mol Biol Evol. 2017 Jun 1;34(6):1529-1534. doi: 10.1093/molbev/msx092.

Abstract

In statistical phylogenetic analyses of DNA sequences, models of evolutionary change commonly assume that base composition is stationary through time and across lineages. This assumption is violated by many data sets, but it is unclear whether the magnitude of these violations is sufficient to mislead phylogenetic inference. We investigated the impacts of compositional heterogeneity on phylogenetic estimates using a method for assessing model adequacy. Based on a detailed simulation study, we found that common frequentist criteria are highly conservative, such that the model is often rejected when the phylogenetic estimates do not show clear signs of bias. We propose new criteria and provide guidelines for their usage. We apply these criteria to genome-scale data from 40 birds and find that loci with severely non-homogeneous base composition are uncommon. Our results show the importance of using well-informed diagnostic statistics when testing model adequacy for phylogenomic analyses.

Keywords: chi-squared statistic test; compositional heterogeneity; model adequacy; phylogenetic inference; predictive simulations; substitution models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Composition / genetics
  • Base Sequence / genetics
  • Bias
  • Biological Evolution
  • Biometry / methods
  • Birds / genetics
  • Computer Simulation
  • Evolution, Molecular
  • Models, Genetic
  • Phylogeny
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data