Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals

Hum Hered. 2003;55(4):179-90. doi: 10.1159/000073202.


The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium ( to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Breast Neoplasms / ethnology*
  • Breast Neoplasms / genetics*
  • Case-Control Studies
  • Cohort Studies
  • Computer Simulation
  • Female
  • Genetic Predisposition to Disease
  • Genotype
  • Haplotypes / genetics*
  • Humans
  • Incidence
  • Likelihood Functions
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Risk Factors
  • Steroid 17-alpha-Hydroxylase / genetics*


  • Steroid 17-alpha-Hydroxylase