Inference from Multiple Imputation for Missing Data Using Mixtures of Normals

Stat Methodol. 2010 May 1;7(3):351-364. doi: 10.1016/j.stamet.2010.01.003.

Abstract

We consider two difficulties with standard multiple imputation methods for missing data based on Rubin's t method for confidence intervals: their often excessive width, and their instability. These problems are present most often when the number of copies is small, as is often the case when a data collection organization is making multiple completed datasets available for analysis. We suggest using mixtures of normals as an alternative to Rubin's t. We also examine the performance of improper imputation methods as an alternative to generating copies from the true posterior distribution for the missing observations. We report the results of simulation studies and analyses of data on health-related quality of life in which the methods suggested here gave narrower confidence intervals and more stable inferences, especially with small numbers of copies or non-normal posterior distributions of parameter estimates. A free R software package called MImix that implements our methods is available from CRAN.