Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny

Mol Biol Evol. 2008 Jan;25(1):83-91. doi: 10.1093/molbev/msm229. Epub 2007 Oct 16.


It has recently been proposed that a well-resolved Tree of Life can be achieved through concatenation of shared genes. There are, however, several difficulties with such an approach, especially in the prokaryotic part of this tree. We tackled some of them using a new combination of maximum likelihood-based methods, developed in order to practice as safe and careful concatenations as possible. First, we used the application concaterpillar on carefully aligned core genes. This application uses a hierarchical likelihood-ratio test framework to assess both the topological congruence between gene phylogenies (i.e., whether different genes share the same evolutionary history) and branch-length congruence (i.e., whether genes that share the same history share the same pattern of relative evolutionary rates). We thus tested if these core genes can be concatenated or should be instead categorized into different incongruent sets. Second, we developed a heat map approach studying the evolution of the phylogenetic support for different bipartitions, when the number of sites of different phylogenetic quality in the concatenation increases. These heatmaps allow us to follow which phylogenetic signals increase or decrease as the concatenation progresses and to detect emerging artifactual groupings, that is, groups that are more and more supported when more and more homoplasic sites are thrown in the analysis. We showed that, as far as 7 major prokaryotic lineages are concerned, only 22 core genes can be said to be congruent and can be safely concatenated. This number is even smaller than the number of genes retained to reconstruct a "Tree of One Per Cent." Furthermore, the concatenation of these 22 markers leads to an unresolved tree as the only groupings in the concatenation tree seem to reflect emerging artifacts. Using concatenated core genes as a valid framework to classify uncharacterized environmental sequences can thus be misleading.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Archaea / genetics*
  • Bacteria / genetics*
  • DNA, Concatenated / genetics*
  • Evolution, Molecular
  • Genes, Archaeal / genetics*
  • Genes, Bacterial / genetics*
  • Phylogeny*
  • Prokaryotic Cells / physiology
  • Sequence Analysis, DNA* / methods


  • DNA, Concatenated