Practical population group assignment with selected informative markers: characteristics and properties of Bayesian clustering via STRUCTURE

Genet Epidemiol. 2005 May;28(4):302-12. doi: 10.1002/gepi.20070.


Population stratification, which is caused by population genetic substructure (PGS), is a critical issue for the design and interpretation of genetic association studies. Methods to address this problem have been devised, but little is known at this point about practical genotyping requirements for resolving PGS based on different marker characteristics. In this report, we seek to (1) identify a small, practical marker set to differentiate African Americans (AAs) from European Americans (EAs), and (2) assess the impact of marker efficiency and sample size on clustering individuals into subgroups by the methods of STRUCTURE (Pritchard et al., [2000a] Genetics 155:945-959). A panel of 37 markers was genotyped for 865 individuals (640 EAs and 225 AAs) from the Northeastern United States. Among EAs, the assignment accuracy reached >99% using only the 4 most efficient markers. Among AAs, the assignment accuracy exceeded 95% when using the 6 most informative markers. Smaller sample size increased the variance in population differentiation, rather than degrading the results consistently. We conclude that the use of marker-efficiency measures for marker selection yielded a relatively small set of STR markers that were effective at differentiating EA and AA populations. The number of markers required is much lower than has been suggested in previous studies.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Black or African American / genetics*
  • Cluster Analysis
  • Genetic Markers
  • Genetics, Population*
  • Genotype
  • Humans
  • Models, Genetic
  • Sample Size
  • White People / genetics*


  • Genetic Markers