On incomplete sampling under birth-death models and connections to the sampling-based coalescent

J Theor Biol. 2009 Nov 7;261(1):58-66. doi: 10.1016/j.jtbi.2009.07.018. Epub 2009 Jul 23.


The constant rate birth-death process is used as a stochastic model for many biological systems, for example phylogenies or disease transmission. As the biological data are usually not fully available, it is crucial to understand the effect of incomplete sampling. In this paper, we analyze the constant rate birth-death process with incomplete sampling. We derive the density of the bifurcation events for trees on n leaves which evolved under this birth-death-sampling process. This density is used for calculating prior distributions in Bayesian inference programs and for efficiently simulating trees. We show that the birth-death-sampling process can be interpreted as a birth-death process with reduced rates and complete sampling. This shows that joint inference of birth rate, death rate and sampling probability is not possible. The birth-death-sampling process is compared to the sampling-based population genetics model, the coalescent. It is shown that despite many similarities between these two models, the distribution of bifurcation times remains different even in the case of very large population sizes. We illustrate these findings on an Hepatitis C virus dataset from Egypt. We show that the transmission times estimates are significantly different-the widely used Gamma statistic even changes its sign from negative to positive when switching from the coalescent to the birth-death process.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Birth Rate*
  • Egypt / epidemiology
  • Hepatitis C / epidemiology
  • Hepatitis C / transmission
  • Humans
  • Models, Biological*
  • Phylogeny
  • Population Density
  • Population Dynamics
  • Stochastic Processes