Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study

Hum Hered. 2003;55(1):27-36. doi: 10.1159/000071807.


We describe an approach for picking haplotype-tagging single nucleotide polymorphisms (htSNPs) that is presently being taken in two large nested case-control studies within a multiethnic cohort (MEC), which are engaged in a search for associations between risk of prostate and breast cancer and common genetic variations in candidate genes. Based on a preliminary sample of 70 control subjects chosen at random from each of the 5 ethnic groups in the MEC we estimate haplotype frequencies using a variant of the Excoffier-Slatkin E-M algorithm after genotyping a high density of SNPs selected every 3-5 kb in and surrounding a candidate gene. In order to evaluate the performance of a candidate set of htSNPS (which will be genotyped in the much larger case-control sample) we treat the haplotype frequencies estimate above as known, and carry out a formal calculation of the uncertainty of the number of copies of common haplotypes carried by an individual, summarizing this calculation as a coefficient of determination, R2h. A candidate set of htSNPS of a given size is chosen so as to maximize the minimum value of R2h over the common haplotypes, h.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Aromatase / genetics*
  • Breast Neoplasms / ethnology*
  • Breast Neoplasms / genetics
  • Case-Control Studies
  • Cohort Studies
  • Computer Simulation
  • Female
  • Genetic Predisposition to Disease
  • Genotype
  • Haplotypes / genetics*
  • Humans
  • Male
  • Polymorphism, Single Nucleotide / genetics*
  • Prostatic Neoplasms / ethnology*
  • Prostatic Neoplasms / genetics


  • Aromatase