Decisive data sets in phylogenomics: lessons from studies on the phylogenetic relationships of primarily wingless insects

Mol Biol Evol. 2014 Jan;31(1):239-49. doi: 10.1093/molbev/mst196. Epub 2013 Oct 18.


Phylogenetic relationships of the primarily wingless insects are still considered unresolved. Even the most comprehensive phylogenomic studies that addressed this question did not yield congruent results. To get a grip on these problems, we here analyzed the sources of incongruence in these phylogenomic studies by using an extended transcriptome data set. Our analyses showed that unevenly distributed missing data can be severely misleading by inflating node support despite the absence of phylogenetic signal. In consequence, only decisive data sets should be used which exclusively comprise data blocks containing all taxa whose relationships are addressed. Additionally, we used Four-cluster Likelihood Mapping (FcLM) to measure the degree of congruence among genes of a data set, as a measure of support alternative to bootstrap. FcLM showed incongruent signal among genes, which in our case is correlated neither with functional class assignment of these genes nor with model misspecification due to unpartitioned analyses. The herein analyzed data set is the currently largest data set covering primarily wingless insects, but failed to elucidate their interordinal phylogenetic relationships. Although this is unsatisfying from a phylogenetic perspective, we try to show that the analyses of structure and signal within phylogenomic data can protect us from biased phylogenetic inferences due to analytical artifacts.

Keywords: Collembola; Diplura; ESTs; Ellipura; Entognatha; Nonoculata; Protura; conflicting hypotheses; likelihood quartet mapping; missing data; phylogenomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chromosome Mapping
  • Databases, Factual*
  • Evolution, Molecular*
  • Genomics
  • Genotyping Techniques / methods
  • Insecta / classification*
  • Insecta / genetics*
  • Models, Genetic
  • Phylogeny*
  • Sequence Alignment
  • Transcriptome