ObStruct: a method to objectively analyse factors driving population structure using Bayesian ancestry profiles

PLoS One. 2014 Jan 9;9(1):e85196. doi: 10.1371/journal.pone.0085196. eCollection 2014.

Abstract

Bayesian inference methods are extensively used to detect the presence of population structure given genetic data. The primary output of software implementing these methods are ancestry profiles of sampled individuals. While these profiles robustly partition the data into subgroups, currently there is no objective method to determine whether the fixed factor of interest (e.g. geographic origin) correlates with inferred subgroups or not, and if so, which populations are driving this correlation. We present ObStruct, a novel tool to objectively analyse the nature of structure revealed in Bayesian ancestry profiles using established statistical methods. ObStruct evaluates the extent of structural similarity between sampled and inferred populations, tests the significance of population differentiation, provides information on the contribution of sampled and inferred populations to the observed structure and crucially determines whether the predetermined factor of interest correlates with inferred population structure. Analyses of simulated and experimental data highlight ObStruct's ability to objectively assess the nature of structure in populations. We show the method is capable of capturing an increase in the level of structure with increasing time since divergence between simulated populations. Further, we applied the method to a highly structured dataset of 1,484 humans from seven continents and a less structured dataset of 179 Saccharomyces cerevisiae from three regions in New Zealand. Our results show that ObStruct provides an objective metric to classify the degree, drivers and significance of inferred structure, as well as providing novel insights into the relationships between sampled populations, and adds a final step to the pipeline for population structure analyses.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Continental Population Groups / classification
  • Continental Population Groups / genetics*
  • Genetic Variation
  • Humans
  • Microsatellite Repeats
  • Models, Genetic*
  • New Zealand
  • Phylogeography
  • Population Dynamics / statistics & numerical data*
  • Saccharomyces cerevisiae / classification
  • Saccharomyces cerevisiae / genetics
  • Software*

Grant support

This work was funded by a grant to MG by the Faculty of Science, University of Auckland. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.