Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST

Bioinformatics. 2017 Jun 15;33(12):1798-1805. doi: 10.1093/bioinformatics/btx088.

Abstract

Motivation: Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses.

Results: We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold.

Availability and implementation: Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference.

Contact: guy.baele@kuleuven.be.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Bayes Theorem
  • Computational Biology / methods
  • Evolution, Molecular*
  • Markov Chains
  • Monte Carlo Method
  • Phylogeny*
  • Sequence Analysis, DNA / methods*
  • Software*