A general framework for studying genetic effects and gene-environment interactions with missing data

Biostatistics. 2010 Oct;11(4):583-98. doi: 10.1093/biostatistics/kxq015. Epub 2010 Mar 26.


Missing data arise in genetic association studies when genotypes are unknown or when haplotypes are of direct interest. We provide a general likelihood-based framework for making inference on genetic effects and gene-environment interactions with such missing data. We allow genetic and environmental variables to be correlated while leaving the distribution of environmental variables completely unspecified. We consider 3 major study designs-cross-sectional, case-control, and cohort designs-and construct appropriate likelihood functions for all common phenotypes (e.g. case-control status, quantitative traits, and potentially censored ages at onset of disease). The likelihood functions involve both finite- and infinite-dimensional parameters. The maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Expectation-Maximization (EM) algorithms are developed to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed inferential and numerical methods perform well in practical settings. Illustration with a genome-wide association study of lung cancer is provided.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biostatistics / methods*
  • Carcinoma, Non-Small-Cell Lung / etiology
  • Carcinoma, Non-Small-Cell Lung / genetics
  • Case-Control Studies
  • Cohort Studies
  • Computer Simulation
  • Cross-Sectional Studies
  • Cysteine Endopeptidases / genetics
  • Disease / etiology
  • Disease / genetics*
  • Environment*
  • Genetic Association Studies / methods*
  • Genotype
  • Haplotypes / genetics
  • Humans
  • Likelihood Functions
  • Nerve Tissue Proteins / genetics
  • Odds Ratio
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics
  • Receptors, Nicotinic / genetics
  • Regression Analysis
  • Smoking / adverse effects
  • Smoking / genetics


  • CHRNA5 protein, human
  • Nerve Tissue Proteins
  • Receptors, Nicotinic
  • nicotinic receptor subunit alpha3
  • Cysteine Endopeptidases
  • alpha4 subunit, proteasome 20S, human