Over the past decade, pathogen genome sequencing has become well established as a powerful approach to study infectious disease epidemiology. In particular, when multiple genomes are available from several geographical locations, comparing them is informative about the relative size of the local pathogen populations as well as past migration rates and events between locations. The structured coalescent model has a long history of being used as the underlying process for such phylogeographic analysis. However, the computational cost of using this model does not scale well to the large number of genomes frequently analysed in pathogen genomic epidemiology studies. Several approximations of the structured coalescent model have been proposed, but their effects are difficult to predict. Here we show how the exact structured coalescent model can be used to analyse a precomputed dated phylogeny, in order to perform Bayesian inference on the past migration history, the effective population sizes in each location, and the directed migration rates from any location to another. We describe an efficient reversible jump Markov Chain Monte Carlo scheme which is implemented in a new R package StructCoalescent. We use simulations to demonstrate the scalability and correctness of our method and to compare it with existing software. We also applied our new method to several state-of-the-art datasets on the population structure of real pathogens to showcase the relevance of our method to current data scales and research questions.
Copyright: © 2025 Roberts et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.