Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program STRUCTURE

Mol Ecol. 2012 Oct;21(20):4925-30. doi: 10.1111/j.1365-294X.2012.05754.x. Epub 2012 Sep 24.


Reproducibility is the benchmark for results and conclusions drawn from scientific studies, but systematic studies on the reproducibility of scientific results are surprisingly rare. Moreover, many modern statistical methods make use of 'random walk' model fitting procedures, and these are inherently stochastic in their output. Does the combination of these statistical procedures and current standards of data archiving and method reporting permit the reproduction of the authors' results? To test this, we reanalysed data sets gathered from papers using the software package STRUCTURE to identify genetically similar clusters of individuals. We find that reproducing structure results can be difficult despite the straightforward requirements of the program. Our results indicate that 30% of analyses were unable to reproduce the same number of population clusters. To improve this, we make recommendations for future use of the software and for reporting STRUCTURE analyses and results in published works.

MeSH terms

  • Bayes Theorem
  • Cluster Analysis
  • Computational Biology / methods*
  • Data Interpretation, Statistical
  • Databases, Genetic
  • Genetics, Population / methods*
  • Reproducibility of Results
  • Software*