Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002:3:4.
doi: 10.1186/1471-2164-3-4. Epub 2002 Feb 5.

Bootstrap, Bayesian probability and maximum likelihood mapping: exploring new tools for comparative genome analyses

Affiliations

Bootstrap, Bayesian probability and maximum likelihood mapping: exploring new tools for comparative genome analyses

Olga Zhaxybayeva et al. BMC Genomics. 2002.

Abstract

Background: Horizontal gene transfer (HGT) played an important role in shaping microbial genomes. In addition to genes under sporadic selection, HGT also affects housekeeping genes and those involved in information processing, even ribosomal RNA encoding genes. Here we describe tools that provide an assessment and graphic illustration of the mosaic nature of microbial genomes.

Results: We adapted the Maximum Likelihood (ML) mapping to the analyses of all detected quartets of orthologous genes found in four genomes. We have automated the assembly and analyses of these quartets of orthologs given the selection of four genomes. We compared the ML-mapping approach to more rigorous Bayesian probability and Bootstrap mapping techniques. The latter two approaches appear to be more conservative than the ML-mapping approach, but qualitatively all three approaches give equivalent results. All three tools were tested on mitochondrial genomes, which presumably were inherited as a single linkage group.

Conclusions: In some instances of interphylum relationships we find nearly equal numbers of quartets strongly supporting the three possible topologies. In contrast, our analyses of genome quartets containing the cyanobacterium Synechocystis sp. indicate that a large part of the cyanobacterial genome is related to that of low GC Gram positives. Other groups that had been suggested as sister groups to the cyanobacteria contain many fewer genes that group with the Synechocystis orthologs. Interdomain comparisons of genome quartets containing the archaeon Halobacterium sp. revealed that Halobacterium sp. shares more genes with Bacteria that live in the same environment than with Bacteria that are more closely related based on rRNA phylogeny. Many of these genes encode proteins involved in substrate transport and metabolism and in information storage and processing. The performed analyses demonstrate that relationships among prokaryotes cannot be accurately depicted by or inferred from the tree-like evolution of a core of rarely transferred genes; rather prokaryotic genomes are mosaics in which different parts have different evolutionary histories. Probability mapping is a valuable tool to explore the mosaic nature of genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Star Like Representation of Genome Relationships. The diagram depicts pairwise comparisons among thirteen genomes. Every genome is represented as a point on the perimeter of a circle. The thickness of the line connecting two genomes reflects the percentage of shared genes between the genomes. The thickest line connecting Aquifex aeolicus and Thermotoga maritima corresponds to 51% shared genes, and the thinnest line connecting Aeropyrum pernix and Borrelia burgdorferi corresponds to 9% shared genes. A gene is considered shared when it had a BLAST hit in the other genome with an E-value below 10-8. The percentage of genes shared between genomes A and B is calculated as ((#of genes in A shared with B/total # of genes in A)+(# of genes in B shared with A/total # of genes in B))/2. Bacteria are depicted in green, Archaea in red and Eukaryotes in blue. The domain affiliation is also indicated by a letter following the species name (A: Archaea, B: Bacteria, and E: Eukaryotes).
Figure 2
Figure 2
Mapping of the probability vector onto an equilateral triangle. Each QuartOP is represented as a probability vector P inside an equilateral triangle. The position of P is determined by the barycentric coordinates (p1, p2, p3), which correspond to the posterior probabilities or bootstrap support values of the three possible tree topologies. The vertices of the triangle T1, T2 and T3 represent the three possible unrooted tree topologies. Geometrically, each of the coordinates (p1, p2, p3) equals the distance between P and the side of the triangle opposite the corresponding vertex. Points closer to a vertex Ti have a larger corresponding probability pi and represent a more probable tree topology than the two alternatives. All the points are classified by their position in one of three zones: "total" zone, "90%" zone and "99%" zone, which are depicted schematically and not drawn to scale. In this diagram, point P corresponds to a dataset which has highest probability for the topology T3, but the probability is below 90%, so the point P is located in the "total" zone, but not in the 99% or 90% zone. Figure adapted from [24].
Figure 3
Figure 3
Data flow for the genome quartet analysis. See Materials and Methods for details.
Figure 4
Figure 4
Maps of a genome quartet with organisms from four different bacterial phyla: Escherichia coli (Gram negative), Deinococcus radiodurans (Deinococcales), Bacillus subtilis (Gram positive) and Treponema pallidum (spirochete). Tree topologies assigned to the vertices are depicted in New Hampshire tree format near the corresponding vertex of the triangle and they are equivalent to the unrooted tree topologies as depicted in Figure 2. The three numbers associated with each tree topology indicate how many QuartOPs fall into each of the three zones: "total", 90% and 99% respectively. For definition of zones see figure 2. A) Probabilities are calculated according to Strimmer and von Haeseler [24]. There is no single topology that is supported by the majority of the QuartOPs and all three possible tree topologies are supported by roughly equal number of QuartOPs at the different probability levels. B) Probabilities are calculated with MrBayes program [31]. C) Bootstrap support values are plotted. For this case the zones are "total", 70% and 90% support, respectively. Bootstrapping appears to provide a more conservative reliability estimate than the posterior probabilities used in cases A and B. Nevertheless, each tree topology is still supported by a roughly equal number of bootstrapped datasets.
Figure 5
Figure 5
Distribution among different functional categories for those datasets that support one of the three topologies with better than 99% posterior probability. Tree topologies are indicated by column numbers 1, 2 and 3. Column 1 corresponds to topology ((1,4),2,3), columns 2 and 3 correspond to topologies ((1,3),2,4) and ((1,2),3,4) respectively. Divisions into functional categories are adopted from the COG database [27]. Functional categories are aggregated into four broad functional meta-categories. Distributions of datasets among the meta-categories are plotted as pie charts for each tree topology. In this case all three topologies are supported by roughly equal number of datasets from each meta-category.
Figure 6
Figure 6
Alignment of mitochondrial cytochrome oxidase subunit II. The alignment for the control mitochondrial quartet m7 (see Table 1) that supports the unexpected ((Homo sapiens, Cafeteria), Saccharomyces, Arabidopsis) topology. The exact matches for each tree topology are colored in three different colors. Blue corresponds to the ((Homo sapiens, Cafeteria), Saccharomyces, Arabidopsis), yellow corresponds to the ((Homo sapiens, Arabidopsis), Saccharomyces, Cafeteria) and green corresponds to the ((Homo sapiens, Saccharomyces), Arabidopsis, Cafeteria) tree topology. As can be seen, the majority of the matches are in favor of ((Homo sapiens, Cafeteria), Saccharomyces, Arabidopsis) tree topology. There are nine parsimony informative positions favoring the latter topology, and only three for each of the other two topologies.
Figure 7
Figure 7
ML map of the quartet representing Bacillus subtilis, the deep branching bacteria T. maritima and A. aeolicus, and the salt-loving archaeon Halobacterium sp.. The majority of the orthologous datasets support the grouping of the Halobacterium with Bacillus subtilis. The topology that corresponds to the 16S rRNA topology (lower left vertex) is supported by the least number of orthologous datasets. The result stayed qualitatively the same when B. subtilis was replacedwith the cyanobacterium Synechocystis sp. (see results for quartet #11 in Table 3). For details on the figure notations see legend for Figure 4. A. Probabilities calculated according to Strimmer and von Haeseler [24]. B. Probabilities calculated with the MrBayes program [31].

Similar articles

Cited by

References

    1. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977;74:5088–5090. - PMC - PubMed
    1. Woese CR. Bacterial evolution. Microbiol Rev. 1987;51:221–271. - PMC - PubMed
    1. Hennig W. Phylogenetic systematics. Urbana, University of Illinois Press. 1966.
    1. Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999;284:2124–2129. doi: 10.1126/science.284.5423.2124. - DOI - PubMed
    1. Ludwig W, Strunk O, Klugbauer S, Klugbauer N, Weizenegger M, Neumaier J, Bachleitner M, Schleifer KH. Bacterial phylogeny based on comparative sequence analysis. Electrophoresis. 1998;19:554–568. - PubMed

LinkOut - more resources