Concordance-Based Approaches for the Inference of Relationships and Molecular Rates with Phylogenomic Data Sets

Syst Biol. 2022 Jun 16;71(4):943-958. doi: 10.1093/sysbio/syab052.

Abstract

Gene tree conflict is common and finding methods to analyze and alleviate the negative effects that conflict has on species tree analysis is a crucial part of phylogenomics. This study aims to expand the discussion of inferring species trees and molecular branch lengths when conflict is present. Conflict is typically examined in two ways: inferring its prevalence and inferring the influence of the individual genes (how strongly one gene supports any given topology compared to an alternative topology). Here, we examine a procedure for incorporating both conflict and the influence of genes in order to infer evolutionary relationships. All supported relationships in the gene trees are analyzed and the likelihood of the genes constrained to these relationships is summed to provide a likelihood for the relationship. Consensus tree assembly is conducted based on the sum of likelihoods for a given relationship and choosing relationships based on the most likely relationship assuming it does not conflict with a relationship that has a higher likelihood score. If it is not possible for all most likely relationships to be combined into a single bifurcating tree then multiple trees are produced and a consensus tree with a polytomy is created. This procedure allows for more influential genes to have a greater influence on an inferred relationship, does not assume conflict has arisen from any one source and does not force the data set to produce a single bifurcating tree. Using this approach, on three empirical data sets, we examine and discuss the relationship between influence and prevalence of gene tree conflict. We find that in one of the data sets, assembling a bifurcating consensus tree solely composed of the most likely relationships is impossible. To account for conflict in molecular rate analysis we also introduce a concordance-based approach to the summary and estimation of branch lengths suitable for downstream comparative analyses. We demonstrate through simulation that even under high levels of stochastic conflict, the mean and median of the concordant rates recapitulate the true molecular rate better than using a supermatrix approach. Using a large phylogenomic data set, we examine rate heterogeneity across concordant genes with a focus on the branch subtending crown angiosperms. Notably, we find highly variable rates of evolution along the branch subtending crown angiosperms. The approaches outlined here have several limitations, but they also represent some alternative methods for harnessing the complexity of phylogenomic data sets and enrich our inferences of both species relationships and evolutionary processes.[Branch length estimation; consensus tree; gene tree conflict; gene tree filtering; phylogenetics; phylogenomics.].

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Magnoliopsida*
  • Phylogeny

Associated data

  • Dryad/10.5061/dryad.2rbnzs7m9