Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;67(2):320-327.
doi: 10.1093/sysbio/syx080.

Probabilistic Distances Between Trees

Affiliations
Free PMC article

Probabilistic Distances Between Trees

Maryam K Garba et al. Syst Biol. .
Free PMC article

Erratum in

Abstract

Most existing measures of distance between phylogenetic trees are based on the geometry or topology of the trees. Instead, we consider distance measures which are based on the underlying probability distributions on genetic sequence data induced by trees. Monte Carlo schemes are necessary to calculate these distances approximately, and we describe efficient sampling procedures. Key features of the distances are the ability to include substitution model parameters and to handle trees with different taxon sets in a principled way. We demonstrate some of the properties of these new distance measures and compare them to existing distances, in particular by applying multidimensional scaling to data sets previously reported as containing phylogenetic islands. [Metric; probability distribution; multidimensional scaling; information geometry.

Figures

Figure 1.
Figure 1.
A histogram of estimated values of formula image for the comparison of every pair of gene trees in the data set of 106 gene trees due to Rokas et al. 2003, using the two-state symmetric model and JS distance. It can be seen that the distance between most pairs of trees in the data set can be estimated accurately with fewer than 2000 samples. In terms of computational cost, this is similar to the cost of computing the likelihood for an alignment of length 2000 for each pair of trees.
Figure 2.
Figure 2.
JS distance and BHV metric between two random 16-taxon trees formula image and formula image with branch lengths scaled by a factor formula image.
Figure 3.
Figure 3.
a) Hellinger distance between the two trees shown in b), as a function of formula image, the length of the long pendant edges. The BHV distance does not vary with formula image.
Figure 4.
Figure 4.
Sampling distribution of distance between two trees for different levels of random deletions of taxa using a) augmented method with Hellinger distance, and b) common taxa method with the BHV metric. The initial pair of trees have the same 100 taxa with the same topology but different (random) edge lengths. The dashed horizontal line is the distance/metric between the initial pair of trees (before any deletions).
Figure 5.
Figure 5.
Sampling distribution of distance between two trees for different levels of random deletions of taxa using a) augmented method with Hellinger distance, and b) common taxa method with the BHV metric. Both trees have 100 taxa with different (random) edge lengths, with one tree generated at random and the other tree determined using 10 subsequent SPR operations. The dashed horizontal line is the distance/metric between the initial pair of trees (before any deletions).
Figure 6.
Figure 6.
Comparison of Hellinger distance between pairs of trees formula image and formula image using overall ML substitution parameters formula image with that using individual ML substitution parameter formula image. The trees used are ML trees formula image obtained from 100 bootstrap replicates of the primate data set.
Figure 7.
Figure 7.
MDS of the pairwise Hellinger distance between a) posterior sample of 1000 trees from the tetrapod data set under GTR+formula image model and b) posterior sample of 500 trees from dengue fever data set under GTR+formula image+I substitution model with uncorrelated lognormal-distributed relaxed molecular clock.

Similar articles

See all similar articles

References

    1. Allman E.S.,, Ané C.,, Rhodes J.A. 2008. Identifiability of a Markovian model of molecular evolution with Gamma-distributed rates. Adv. Appl. Probab. 40:229–249.
    1. Billera L.J.,, Holmes S.P.,, Vogtmann K. 2001. Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27:733–767.
    1. Drummond A.J.,, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. - PMC - PubMed
    1. Estabrook G.F.,, McMorris F.R.,, Meacham C.A. 1985. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst. Biol. 34:193–200.
    1. Felsenstein J. 2008. Inferring phylogenies. Sunderland, MA: Sinauer Associates, Inc.

Publication types

Feedback