Efficient Bayesian Phylogenetics under the Infinite Sites Model

bioRxiv [Preprint]. 2025 Nov 16:2025.11.14.688551. doi: 10.1101/2025.11.14.688551.

Abstract

Bayesian phylogenetic inference from molecular sequences can provide key insights into the evolutionary history of populations. Existing tools, however, often scale poorly with sample size. We present inPhynite, a highly-efficient Bayesian phylogenetics algorithm for genomic datasets compatible with the infinite sites mutation model. A key advantage of this model is that likelihood calculation, which typically incurs a substantial computational cost, becomes trivial. We show that under the infinite sites assumption, it is possible to sample a coarse space of mutations and coalescences from which we may recover complete phylogenetic trees. We design an efficient Markov chain for this space together with effective population size trajectories, modeled as piecewise constant functions. Based on real and synthetic data, our method significantly outperforms competing methods, offering a speedup of over 225 times in statistical efficiency on large datasets without incurring any loss in accuracy. Finally, we demonstrate how inPhynite can help us understand the evolutionary history and past effective population sizes of human populations based on mitochondrial DNA.

Keywords: Bayesian; animalia; coalescent; effective population size; infinite sites; phylogenetics.

Publication types

  • Preprint