Principal components analysis corrects for stratification in genome-wide association studies

Nat Genet. 2006 Aug;38(8):904-9. doi: 10.1038/ng1847. Epub 2006 Jul 23.


Population stratification--allele frequency differences between cases and controls due to systematic ancestry differences-can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Case-Control Studies
  • Databases, Nucleic Acid
  • Genetic Markers
  • Genome, Human
  • Genomics / statistics & numerical data*
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis


  • Genetic Markers