No BLUE among phylogenetic estimators

J Math Biol. 1999 Nov;39(5):421-38. doi: 10.1007/s002850050173.


Multivariate analysis is a branch of statistics that successfully exploits the powerful tools of linear algebra to obtain a fairly comprehensive theory of estimation. The purpose of this paper is to explore to what extent a linear theory of estimation can be developed in the context of coalescent models used in the analysis of DNA polymorphism. We consider a large class of coalescent models, of which the neutral infinite sites model is one example. In the process, we discover several limitations of linear estimators that are quite distinct from those in the classical theory. In particular, we prove that there does not exist a uniformly BLUE (best linear unbiased estimator) for the scaled mutation parameter, under the assumptions of the neutral model of evolution. In fact, we show that no linear estimator performs uniformly better than the Watterson (1975) method based on the total number of segregating sites. For certain coalescent models, the segregating-sites estimator is actually optimal.The general conclusion is the following. If genealogical information is useful for estimating the rate of evolution, then there is no optimal linear method. If there is an optimal linear method, then no information other than the total number of segregating sites is needed.

MeSH terms

  • Base Sequence / genetics
  • Evolution, Molecular*
  • Linear Models
  • Models, Biological*
  • Multivariate Analysis
  • Mutation
  • Phylogeny*
  • Polymorphism, Genetic / genetics*