Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011;3:571-87.
doi: 10.1093/gbe/evr050. Epub 2011 Jun 28.

Evaluating Phylogenetic Congruence in the Post-Genomic Era

Affiliations
Free PMC article
Review

Evaluating Phylogenetic Congruence in the Post-Genomic Era

Jessica W Leigh et al. Genome Biol Evol. .
Free PMC article

Abstract

Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Scheme of the expected gene tree distributions for eukaryotic versus prokaryotic data sets. Each tree corresponds to an individual gene tree. The color of the tree indicates the phylogenetic history of the gene. Monochromatic gene trees have undergone a given phylogenetic history. Bichromatic trees have evidence of multiple distinct evolutionary histories. Trees, and branches, with similar colors have closer evolutionary histories. Solid trees are strongly resolved; trees with dashed branches are poorly resolved for those branches. Boxes around some trees indicate: 1) gene trees that were frequently transferred horizontally (green-filled boxes) or 2) gene trees that were very rarely transferred horizontally (uncolored boxes). The expected forest of gene trees from eukaryotes is very different—less variable and patchy—from that expected from prokaryotes and mobile elements.
F<sc>IG</sc>. 2.—
FIG. 2.—
Pitfalls and possible improvements in incongruence analyses of prokaryotic forests of gene trees. The main steps—and their respective limitations, in red—of most incongruence tests available currently, as described in main text. The color code for gene trees is the same than in figure 1. In the bottom right corner, we suggest some groups of concordant gene trees worth identifying to better analyze forests of prokaryotic gene trees and of mobile elements, which will however require refined incongruence analyses.
F<sc>IG</sc>. 3.—
FIG. 3.—
Patchy taxonomic distributions and incongruence. In some cases, markers may appear homogeneous when only taxa appearing in both markers are considered when their true histories are clearly incongruent. In (a), all taxa in the analysis are present; (b) only a few members of one clan are present; (c) members of one clan are completely absent. It is highly unlikely that the patchy presence of marker (b) among Archaebacteria can be explained by differential loss; it is more plausible that this marker was transferred from Eubacteria, then subsequently among archaebacterial lineages. Thus, although there is a split separating archaebacterial and Eubacterial lineages, the history of marker (b) is incongruent with that of marker (a). In the case of marker (c), its complete absence from Archaebacteria suggests its emergence in Eubacteria following their divergence from Archaebacteria.
F<sc>IG</sc>. 4.—
FIG. 4.—
Heatmap showing AU and SH test results with NUTs and their gene trees. The AU and SH tests were used to assess the support of each marker in the 100-gene NUTs data set for the ML gene trees in the data set, as well as the global tree inferred by ML from the concatenated data set. (a) AU test P values and (b) SH test P values. Each row represents an individual tree topology, whereas each column represents an individual marker. Names of markers and trees corresponding to each row and column are indicated; the row corresponding to the global tree is indicated by the blue-highlighted name “global” and by a box around the row of cells. Rows and columns are sorted according to dendrograms above and to the left of the heatmap, which indicate similarity in patterns of P values. The cells of the heatmaps are themselves colored according to the P values from the AU or SH test, such that very small P values (indicating rejection of a particular tree topology with a particular gene) are shown in darker green shades, whereas larger P values are shown in yellow, orange, or white.
F<sc>IG</sc>. 5.—
FIG. 5.—
Hierarchically clustered pairwise CADM test P values. The CADM test rejected global incongruence of the data set (P < 0.001), indicating that at least one pair of markers was not incongruent over at least some part of their histories. We then assessed pairwise incongruence with Mantel tests and then clustered the P values hierarchically using a complete linkage algorithm. Those markers clustered above the threshold of 0.05 (indicated by a dashed red horizontal line) were considered homogeneous.
F<sc>IG</sc>. 6.—
FIG. 6.—
Similarity in homogeneous sets identified by CADM and Concaterpillar. (a) Venn diagram showing overlap in homogeneous sets identified by Concaterpillar (blue) and CADM (green) with the 41-taxon NUTs data set. (b) Venn diagram showing overlap in homogeneous sets identified by Concaterpillar with the 41-taxon (blue) and 100-taxon (red) data sets. One cluster was found in the 100-taxon data set but was incompatible with this Venn diagram; the members of this cluster (COG0081, COG0541, and COG2812) are indicated by an asterisk. Singletons (genes identified as incongruent to all others) identified by both methods are not shown.

Similar articles

See all similar articles

Cited by 13 articles

See all "Cited by" articles

References

    1. Abdi H. In: Encyclopedia of measurement and statistics. Salkind NJ, editor. Thousand Oaks (CA): SAGE; 2007. p. 103--107.
    1. Adams EN. N-trees as nestings: complexity, similarity, and consensus. J Classif. 1986;3:299–317.
    1. Bapteste E, et al. Prokaryotic evolution and the tree of life are two different things. Biol Direct. 2009;4:34. - PMC - PubMed
    1. Bapteste E, et al. Do orthologous gene phylogenies really support tree-thinking? BMC Evol Biol. 2005;5:33. - PMC - PubMed
    1. Bapteste E, et al. Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny. Mol Biol Evol. 2008;25:83–91. - PubMed

Publication types

Feedback