Estimates of genetic differentiation measured by F(ST) do not necessarily require large sample sizes when using many SNP markers

PLoS One. 2012;7(8):e42649. doi: 10.1371/journal.pone.0042649. Epub 2012 Aug 14.


Population genetic studies provide insights into the evolutionary processes that influence the distribution of sequence variants within and among wild populations. F(ST) is among the most widely used measures for genetic differentiation and plays a central role in ecological and evolutionary genetic studies. It is commonly thought that large sample sizes are required in order to precisely infer F(ST) and that small sample sizes lead to overestimation of genetic differentiation. Until recently, studies in ecological model organisms incorporated a limited number of genetic markers, but since the emergence of next generation sequencing, the panel size of genetic markers available even in non-reference organisms has rapidly increased. In this study we examine whether a large number of genetic markers can substitute for small sample sizes when estimating F(ST). We tested the behavior of three different estimators that infer F(ST) and that are commonly used in population genetic studies. By simulating populations, we assessed the effects of sample size and the number of markers on the various estimates of genetic differentiation. Furthermore, we tested the effect of ascertainment bias on these estimates. We show that the population sample size can be significantly reduced (as small as n = 4-6) when using an appropriate estimator and a large number of bi-allelic genetic markers (k>1,000). Therefore, conservation genetic studies can now obtain almost the same statistical power as studies performed on model organisms using markers developed with next-generation sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Animals
  • Ecology
  • Female
  • Gene Frequency
  • Genetic Drift
  • Genetic Markers
  • Genetics, Population
  • Genome
  • Genotype
  • Humans
  • Male
  • Models, Genetic*
  • Models, Statistical
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA


  • Genetic Markers

Grant support

Supported by a Gottfried Wilhelm Leibniz Award of the Deutsche Forschungsgemeinschaft to Detlef Weigel at the Max Planck Institute for Developmental Biology, and the Max Planck Society. CvO was funded by ELSA, The Earth and Life Sciences Alliance. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.