Exact and efficient phylodynamic simulation from arbitrarily large populations

ArXiv [Preprint]. 2024 Feb 27:arXiv:2402.17153v1.

Abstract

Many biological studies involve inferring the genealogical history of a sample of individuals from a large population and interpreting the reconstructed tree. Such an ascertained tree typically represents only a small part of a comprehensive population tree and is distorted by survivorship and sampling biases. Inferring evolutionary parameters from ascertained trees requires modeling both the underlying population dynamics and the ascertainment process. A crucial component of this phylodynamic modeling involves tree simulation, which is used to benchmark probabilistic inference methods. To simulate an ascertained tree, one must first simulate the full population tree and then prune unobserved lineages. Consequently, the computational cost is determined not by the size of the final simulated tree, but by the size of the population tree in which it is embedded. In most biological scenarios, simulations of the entire population are prohibitively expensive due to computational demands placed on lineages without sampled descendants. Here, we address this challenge by proving that, for any partially ascertained process from a general multi-type birth-death-mutation-sampling (BDMS) model, there exists an equivalent pure birth process (i.e., no death) with mutation and complete sampling. The final trees generated under these processes have exactly the same distribution. Leveraging this property, we propose a highly efficient algorithm for simulating trees under a general BDMS model. Our algorithm scales linearly with the size of the final simulated tree and is independent of the population size, enabling simulations from extremely large populations beyond the reach of current methods but essential for various biological applications. We anticipate that this unprecedented speedup will significantly advance the development of novel inference methods that require extensive training data.

Publication types

  • Preprint