OptM: estimating the optimal number of migration edges on population trees using Treemix

Biol Methods Protoc. 2021 Sep 16;6(1):bpab017. doi: 10.1093/biomethods/bpab017. eCollection 2021.

Abstract

The software Treemix has become extensively used to estimate the number of migration events, or edges (m), on population trees from genome-wide allele frequency data. However, the appropriate number of edges to include remains unclear. Here, I show that an optimal value of m can be inferred from the second-order rate of change in likelihood (Δm) across incremental values of m. Repurposed from its original use to estimate the number of population clusters in the software StructureK), I show using simulated populations that Δm performs equally as well as current recommendations for Treemix. A demonstration of an empirical dataset from domestic dogs indicates that this method may be preferable in large, complex population histories and can prioritize migration events for subsequent investigation. The method has been implemented in a freely available R package called "OptM" and as a web application (https://rfitak.shinyapps.io/OptM/) to interface directly with the output files of Treemix.

Keywords: SNPs; likelihood; population genomics; structure.