Linkage effects and analysis of finite sample errors in the HapMap

Hum Hered. 2009;68(2):73-86. doi: 10.1159/000212500. Epub 2009 Apr 9.


The HapMap provides a valuable resource to help uncover genetic variants of important complex phenotypes such as disease risk and outcome. Using the HapMap we can infer the patterns of LD within different human populations. This is a critical step for determining which SNPs to genotype as part of a study, estimating study power, designing a follow-up study to identify the causal variants, 'imputing' untyped SNPs, and estimating recombination rates along the genome. Despite its tremendous importance, the HapMap suffers from the fundamental limitation that at most 60 unrelated individuals are available per population. We present an analytical framework for analyzing the implications of a finite sample HapMap. We present and justify simple approximations for deriving analytical estimates of important statistics such as the square of the correlation coefficient r(2) between two SNPs. Finally, we use this framework to show that current HapMap based estimates of r(2) and power have significant errors, and that tag sets highly overestimate their coverage. We show that a reasonable increase in the number of individuals, such as that proposed by the 1000 genomes project, greatly reduces the errors due to finite sample size for a large proportion of SNPs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Case-Control Studies
  • Genetic Linkage*
  • Haplotypes*
  • Models, Theoretical
  • Polymorphism, Single Nucleotide
  • Sample Size