Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals

Daniel O Stram; Celeste Leigh Pearce; Phillip Bretsky; Matthew Freedman; Joel N Hirschhorn; David Altshuler; Laurence N Kolonel; Brian E Henderson; Duncan C Thomas

doi:10.1159/000073202

Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals

Hum Hered. 2003;55(4):179-90. doi: 10.1159/000073202.

Authors

Daniel O Stram¹, Celeste Leigh Pearce, Phillip Bretsky, Matthew Freedman, Joel N Hirschhorn, David Altshuler, Laurence N Kolonel, Brian E Henderson, Duncan C Thomas

Affiliation

¹ Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA. stram@usc.edu

PMID: 14566096
DOI: 10.1159/000073202

Abstract

The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well.

Publication types

Comparative Study
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Breast Neoplasms / ethnology*
Breast Neoplasms / genetics*
Case-Control Studies
Cohort Studies
Computer Simulation
Female
Genetic Predisposition to Disease
Genotype
Haplotypes / genetics*
Humans
Incidence
Likelihood Functions
Models, Genetic*
Polymorphism, Single Nucleotide / genetics*
Risk Factors
Steroid 17-alpha-Hydroxylase / genetics*

Substances

Steroid 17-alpha-Hydroxylase

Abstract

Publication types

MeSH terms

Substances

Grants and funding