Large-scale phylogenies and measuring the performance of phylogenetic estimators

J Kim

doi:10.1080/106351598261021

Large-scale phylogenies and measuring the performance of phylogenetic estimators

Syst Biol. 1998 Mar;47(1):43-60. doi: 10.1080/106351598261021.

Author

J Kim¹

Affiliation

¹ Department of Biology, Yale University, New Haven, Connecticut 06511, USA. junhyong_kim@quickmail.yale.edu

PMID: 12064240
DOI: 10.1080/106351598261021

Abstract

Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular, the relationships between different performance measures such as accuracy and consistency or accuracy and error depend on the exact definition of these measures. In addition, it is relatively common to use large-sample behavior to infer similar behavior for small samples. In fact, large-sample results such as the claimed asymptotic efficiency of the maximum-likelihood estimator are often uninformative for small samples. Conversely, small-sample behavior using simulations is sometimes used to imply large-sample behavior such as consistency. However, such extrapolation is often difficult. How the performance of a phylogenetic estimator scales with the addition of taxa must be qualified with respect to whether the whole tree is being estimated or a fixed subset of taxa is being estimated. It must also be qualified with respect to how tree models are sampled. Over the ensemble of all possible trees of a given size, the performance of the estimators for the whole tree estimate suffers when the tree size becomes larger. However, under certain models of cladogenesis, the estimate can improve with the addition of taxa. In fact, at all numbers of taxa there are subsets of tree models that are easier to estimate than others. This suggests that with judicious addition or subtraction of taxa we can move from tree models that are more difficult to estimate at one number of taxa to those that are easier to estimate at another number of taxa.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Analysis of Variance
Classification / methods
MEDLINE
Observer Variation
Phylogeny*
Reproducibility of Results