Simulating trees with millions of species

Bioinformatics. 2020 May 1;36(9):2907-2908. doi: 10.1093/bioinformatics/btaa031.

Abstract

Motivation: The birth-death (BD) model constitutes the theoretical backbone of most phylogenetic tools for reconstructing speciation/extinction dynamics over time. Performing simulations of reconstructed trees (linking extant taxa) under the BD model in backward time, conditioned on the number of species sampled at present day and, in some cases, a specific time interval since the most recent common ancestor (MRCA), is needed for assessing the performance of reconstruction tools, for parametric bootstrapping and for detecting data outliers. The few simulation tools that exist scale poorly to large modern phylogenies, which can comprise thousands or even millions of tips (and rising).

Results: Here I present efficient software for simulating reconstructed phylogenies under time-dependent BD models in backward time, conditioned on the number of sampled species and (optionally) on the time since the MRCA. On large trees, my software is 1000-10 000 times faster than existing tools.

Availability and implementation: The presented software is incorporated into the R package 'castor', which is available on The Comprehensive R Archive Network (CRAN).

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Phylogeny
  • Software*