SNP selection and multidimensional scaling to quantify population structure

Genet Epidemiol. 2009 Sep;33(6):488-96. doi: 10.1002/gepi.20401.


In the new era of large-scale collaborative Genome Wide Association Studies (GWAS), population stratification has become a critical issue that must be addressed. In order to build upon the methods developed to control the confounding effect of a structured population, it is extremely important to visualize and quantify that effect. In this work, we develop methodology for single nucleotide polymorphism (SNP) selection and subsequent population stratification visualization based on deviation from Hardy-Weinberg equilibrium in conjunction with non-metric multidimensional scaling (MDS); a distance-based multivariate technique. Through simulation, it is shown that SNP selection based on Hardy-Weinberg disequilibrium (HWD) is robust against confounding linkage disequilibrium patterns that have been problematic in past studies and methods as well as producing a differentiated SNP set. Non-metric MDS is shown to be a multivariate visualization tool preferable to principal components in conjunction with HWD SNP selection through theoretical and empirical study from HapMap samples. The proposed selection tool offers a simple and effective way to select appropriate substructure-informative markers for use in exploring the effect that population stratification may have in association studies.

MeSH terms

  • Algorithms
  • Genetic Markers
  • Genetics, Population / methods*
  • Genome-Wide Association Study
  • Humans
  • Polymorphism, Single Nucleotide*
  • Selection, Genetic*


  • Genetic Markers