The Effect of Gene Flow on Coalescent-based Species-Tree Inference

Syst Biol. 2018 Sep 1;67(5):770-785. doi: 10.1093/sysbio/syy020.

Abstract

Most current methods for inferring species-level phylogenies under the coalescent model assume that no gene flow occurs following speciation. Several studies have examined the impact of gene flow (e.g., Eckert and Carstens 2008; Chung and Ané 2011; Leaché et al. 2014; Solís-Lemus et al. 2016) and of ancestral population structure (DeGeorgio and Rosenberg 2016) on the performance of species-level phylogenetic inference, and analytic results have been proven for network models of gene flow (e.g., Solís-Lemus et al. 2016; Zhu et al. 2016). However, there are few analytic results for a continuous model of gene flow following speciation, despite the development of mathematical tools that could facilitate such study (e.g., Hobolth et al. 2011; Andersen et al. 2014; Tian and Kubatko 2016). In this article, we consider a three-taxon isolation-with-migration model that allows gene flow between sister taxa for a brief period following speciation, as well as variation in the effective population sizes across the species tree. We derive the probabilities of each of the three gene tree topologies under this model, and show that for certain choices of the gene flow and effective population size parameters, anomalous gene trees (i.e., gene trees that are discordant with the species tree but that have higher probability than the gene tree concordant with the species tree) exist. We characterize the region of parameter space producing anomalous trees and show that the probability of the gene tree that is concordant with the species tree can be arbitrarily small. We then show that there is theoretical support for using SVDQuartets with an outgroup to infer the rooted three-taxon species tree in a model of gene flow between sister taxa. We study the performance of SVDQuartets on simulated data and compare it to three other commonly-used methods for species tree inference, ASTRAL, MP-EST, and concatenation. The simulations show that ASTRAL, MP-EST, and concatenation can be statistically inconsistent when gene flow is present, while SVDQuartets performs well, though large sample sizes may be required for certain parameter choices.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Gene Flow*
  • Genetic Speciation*
  • Models, Genetic*
  • Phylogeny*
  • Population Density
  • Probability

Associated data

  • Dryad/10.5061/dryad.j11fh