On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods

Syst Biol. 2015 Jul;64(4):663-76. doi: 10.1093/sysbio/syv016. Epub 2015 Mar 25.

Abstract

The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.

Keywords: coalescent-based methods; gene tree estimation error; incomplete lineage sorting; multi-species coalescent; species tree reconstruction; statistical consistency.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Classification / methods*
  • Computer Simulation
  • Genes / genetics
  • Phylogeny*