Missing data methods in Mendelian randomization studies with multiple instruments

Am J Epidemiol. 2011 Nov 1;174(9):1069-76. doi: 10.1093/aje/kwr235. Epub 2011 Sep 30.


Mendelian randomization studies typically have low power. Where there are several valid candidate genetic instruments, precision can be gained by using all the instruments available. However, sporadically missing genetic data can offset this gain. The authors describe 4 Bayesian methods for imputing the missing data based on a missing-at-random assumption: multiple imputations, single nucleotide polymorphism (SNP) imputation, latent variables, and haplotype imputation. These methods are demonstrated in a simulation study and then applied to estimate the causal relation between C-reactive protein and each of fibrinogen and coronary heart disease, based on 3 SNPs in British Women's Heart and Health Study participants assessed at baseline between May 1999 and June 2000. A complete-case analysis based on all 3 SNPs was found to be more precise than analyses using any 1 SNP alone. Precision is further improved by using any of the 4 proposed missing data methods; the improvement is equivalent to about a 25% increase in sample size. All methods gave similar results, which were apparently not overly sensitive to violation of the missing-at-random assumption. Programming code for the analyses presented is available online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • C-Reactive Protein / genetics
  • Coronary Disease / epidemiology
  • Coronary Disease / genetics
  • Female
  • Fibrinogen / genetics
  • Genetic Association Studies
  • Haplotypes / genetics
  • Humans
  • Mendelian Randomization Analysis / methods*
  • Mendelian Randomization Analysis / standards
  • Models, Genetic
  • Multivariate Analysis


  • Fibrinogen
  • C-Reactive Protein