Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5, 33

Do Orthologous Gene Phylogenies Really Support Tree-Thinking?


Do Orthologous Gene Phylogenies Really Support Tree-Thinking?

E Bapteste et al. BMC Evol Biol.


Background: Since Darwin's Origin of Species, reconstructing the Tree of Life has been a goal of evolutionists, and tree-thinking has become a major concept of evolutionary biology. Practically, building the Tree of Life has proven to be tedious. Too few morphological characters are useful for conducting conclusive phylogenetic analyses at the highest taxonomic level. Consequently, molecular sequences (genes, proteins, and genomes) likely constitute the only useful characters for constructing a phylogeny of all life. For this reason, tree-makers expect a lot from gene comparisons. The simultaneous study of the largest number of molecular markers possible is sometimes considered to be one of the best solutions in reconstructing the genealogy of organisms. This conclusion is a direct consequence of tree-thinking: if gene inheritance conforms to a tree-like model of evolution, sampling more of these molecules will provide enough phylogenetic signal to build the Tree of Life. The selection of congruent markers is thus a fundamental step in simultaneous analysis of many genes.

Results: Heat map analyses were used to investigate the congruence of orthologues in four datasets (archaeal, bacterial, eukaryotic and alpha-proteobacterial). We conclude that we simply cannot determine if a large portion of the genes have a common history. In addition, none of these datasets can be considered free of lateral gene transfer.

Conclusion: Our phylogenetic analyses do not support tree-thinking. These results have important conceptual and practical implications. We argue that representations other than a tree should be investigated in this case because a non-critical concatenation of markers could be highly misleading.


Figure 1
Figure 1
Figure 1A. displays the heat map for the archaeal dataset, Figure 1B. for the eukaryotic dataset. Heat maps include two kinds of markers: actual ones, indicated by a red rectangle at the left of the heat map, and artificial markers with extreme LGT (see main text), indicated in blue. They are based on a set of plausible topologies (see main text). The number of genes and topologies in the analysis are indicated on the heat map. These heat maps are double-clustered by genes and by topologies. The hierarchical clusters are represented by a tree of genes and a tree of topologies along the heat map. In the left band, the relative distribution of red and blue rectangles reflects the presence/absence of clustering of actual markers with artificial ones. Inside a heat map each dot of colour corresponds to the p-value for a given gene and a given topology. The p-values range from 0 (rejection) to 1 (support). The colour code associated with these p-values (from green for rejection to white for support) are reported above the heatmap. On the right of each heat map, the orange brackets indicate regions containing markers with a weak discriminatory power; the green brackets indicate regions containing markers with a stronger discriminatory power. Amongst the markers with a stronger phylogenetic signal, pink arrows point to some instances of conflicting signal in actual markers. They indicate different columns displaying a contrasting pattern of colour and contradictory p-values for several orthologues in a dataset.
Figure 2
Figure 2
Figure 2A. displays the heat map for the alphaproteobacterial dataset and figure 2B for the bacterial dataset. See the legend of Figure 1 for details.
Figure 3
Figure 3
Figure 3 displays the synthesis of 34 alphaproteobacterial genes (atp1, atp6, atp9, cob, cox2, cox3, nad1, nad2, nad3, nad4, nad4l, nad5, nad6, nad7, nad8, nad11, rpl2, rpl5, rpl6, rpl11, rpl14, rpl16, rpoA, rpoB, rpoC, rps7, rps10, rps12, rps13, rps14, rps19, sdh2, sdh3 and tufA). The proposed vertical-inheritance backbone representing the concatenation tree is shown in dark blue, with the line thickness of an internal branch corresponding to the frequency of its support across the whole dataset. Support was considered significant when clades received > 50% bootstrap support. Putative LGT events are in orange, connecting donors (circles) with recipients (arrowheads); where there are multiple possible donor candidates, these converge onto a double arrowhead. This happens when the clade founded by a past LGT donor may have subsequently had its species membership obfuscated by later exchanges of genetic material, yielding a non-reference assemblage of species labels in a presumed lineage. Where the apparent donor of a gene falls outside of the taxa included in the analysis, one is created as a basal group taxon, indicated in light blue. In order to avoid graphical congestion, branches in the tree may be artificially extended, as dotted segments.

Similar articles

See all similar articles

Cited by 61 articles

See all "Cited by" articles


    1. O'Hara RJ. Population thinking and tree thinking in systematics. Zoologica Scripta. 1997;26:323–329.
    1. Philippe H, Douady CJ. Horizontal gene transfer and phylogenetics. Curr Opin Microbiol. 2003;6:498–505. doi: 10.1016/j.mib.2003.09.008. - DOI - PubMed
    1. Rivera MC, Lake JA. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature. 2004;431:152–5. doi: 10.1038/nature02848. - DOI - PubMed
    1. Wolf YI, Rogozin IB, Grishin NV, Koonin EV. Genome trees and the tree of life. Trends Genet. 2002;18:472–9. doi: 10.1016/S0168-9525(02)02744-0. - DOI - PubMed
    1. Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon P, Durufle L, Gaasterland T, Lopez P, Muller M, Philippe H. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci U S A. 2002;99:1414–9. doi: 10.1073/pnas.032662799. - DOI - PMC - PubMed

Publication types


LinkOut - more resources