Analysis of untyped SNPs: maximum likelihood and imputation methods

Genet Epidemiol. 2010 Dec;34(8):803-15. doi: 10.1002/gepi.20527.

Abstract

Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Case-Control Studies
  • Computer Simulation
  • Confidence Intervals
  • Cross-Sectional Studies
  • Diabetes Mellitus, Type 1 / genetics
  • Environment
  • Genetic Variation
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Genotype
  • Haplotypes
  • Humans
  • Likelihood Functions*
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide / genetics*
  • Risk
  • Software