Imputation methods to improve inference in SNP association studies

Genet Epidemiol. 2006 Dec;30(8):690-702. doi: 10.1002/gepi.20180.


Missing single nucleotide polymorphisms (SNPs) are quite common in genetic association studies. Subjects with missing SNPs are often discarded in analyses, which may seriously undermine the inference of SNP-disease association. In this article, we develop two haplotype-based imputation approaches and one tree-based imputation approach for association studies. The emphasis is to evaluate the impact of imputation on parameter estimation, compared to the standard practice of ignoring missing data. Haplotype-based approaches build on haplotype reconstruction by the expectation-maximization (EM) algorithm or a weighted EM (WEM) algorithm, depending on whether case-control status is taken into account. The tree-based approach uses a Gibbs sampler to iteratively sample from a full conditional distribution, which is obtained from the classification and regression tree (CART) algorithm. We employ a standard multiple imputation procedure to account for the uncertainty of imputation. We apply the methods to simulated data as well as a case-control study on developmental dyslexia. Our results suggest that imputation generally improves efficiency over the standard practice of ignoring missing data. The tree-based approach performs comparably well as haplotype-based approaches, but the former has a computational advantage. The WEM approach yields the smallest bias at a price of increased variance.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Chromosome Mapping
  • Computer Simulation
  • Dyslexia / genetics
  • Haplotypes
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide*
  • Probability
  • Reproducibility of Results
  • Research Design