Gene Genealogies When the Sample Size Exceeds the Effective Size of the Population

Mol Biol Evol. 2003 Feb;20(2):208-13. doi: 10.1093/molbev/msg024.


We study the properties of gene genealogies for large samples using a continuous approximation introduced by R. A. Fisher. We show that the major effect of large sample size, relative to the effective size of the population, is to increase the proportion of polymorphisms at which the mutant type is found in a single copy in the sample. We derive analytical expressions for the expected number of these singleton polymorphisms and for the total number of polymorphic, or segregating, sites that are valid even when the sample size is much greater than the effective size of the population. We use simulations to assess the accuracy of these predictions and to investigate other aspects of large-sample genealogies. Lastly, we apply our results to some data from Pacific oysters sampled from British Columbia. This illustrates that, when large samples are available, it is possible to estimate the mutation rate and the effective population size separately, in contrast to the case of small samples in which only the product of the mutation rate and the effective population size can be estimated.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA, Mitochondrial / genetics
  • Genetics, Population / methods*
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Mutation
  • Polymorphism, Genetic
  • Sample Size


  • DNA, Mitochondrial