Identifying localized biases in large datasets: a case study using the avian tree of life

Mol Phylogenet Evol. 2013 Dec;69(3):1021-32. doi: 10.1016/j.ympev.2013.05.029. Epub 2013 Jun 20.

Abstract

Large-scale multi-locus studies have become common in molecular phylogenetics, with new studies continually adding to previous datasets in an effort to fully resolve the tree of life. Total evidence analyses that combine existing data with newly collected data are expected to increase the power of phylogenetic analyses to resolve difficult relationships. However, they might be subject to localized biases, with one or a few loci having a strong and potentially misleading influence upon the results. To examine this possibility we combined a newly collected 31-locus dataset that includes representatives of all major avian lineages with a published dataset of 19 loci that has a comparable number of sites (Hackett et al., 2008. Science 320, 1763-1768). This allowed us to explore the advantages of conducting total evidence analyses, and to determine whether it was also important to analyze new datasets independent of published ones. The total evidence analysis yielded results very similar to the published results, with only slightly increased support at a few nodes. However, analyzing the 31- and 19-locus datasets separately highlighted several differences. Two clades received strong support in the published dataset and total evidence analysis, but the support appeared to reflect bias at a single locus (β-fibrinogen [FGB]). The signal in FGB that supported these relationships was sufficient to result in their recovery with bootstrap support, even when combined with 49 loci lacking that signal. FGB did not appear to have a substantial impact upon the results of species tree methods, but another locus (brain-derived neurotrophic factor [BDNF]) did have an impact upon those analyses. These results demonstrated that localized biases can influence large-scale phylogenetic analyses but they also indicated that considering independent evidence and exploring multiple analytical approaches could reveal them.

Keywords: Gene tree discordance; Incongruence; Localized biases; Phylogenomics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Bias
  • Biological Evolution*
  • Birds / classification*
  • Birds / genetics
  • Likelihood Functions
  • Models, Genetic
  • Phylogeny*
  • Sequence Alignment
  • Sequence Analysis, DNA