Imputation strategies are widely used in settings that involve inference with incomplete data. However, implementation of a particular approach always rests on assumptions, and subtle distinctions between methods can have an impact on subsequent analyses. In this research article, we are concerned with regression models in which the true underlying relationship includes interaction terms. We focus in particular on a linear model with one fully observed continuous predictor, a second partially observed continuous predictor, and their interaction. We derive the conditional distribution of the missing covariate and interaction term given the observed covariate and the outcome variable, and examine the performance of a multiple imputation procedure based on this distribution. We also investigate several alternative procedures that can be implemented by adapting multivariate normal multiple imputation software in ways that might be expected to perform well despite incompatibilities between model assumptions and true underlying relationships among the variables. The methods are compared in terms of bias, coverage, and CI width. As expected, the procedure based on the correct conditional distribution performs well across all scenarios. Just as importantly for general practitioners, several of the approaches based on multivariate normality perform comparably with the correct conditional distribution in a number of circumstances, although interestingly, procedures that seek to preserve the multiplicative relationship between the interaction term and the main-effects are found to be substantially less reliable. For illustration, the various procedures are applied to an analysis of post-traumatic stress disorder symptoms in a study of childhood trauma.
Keywords: interaction; missing covariate; multiple imputation; multivariate normal; regression.
Copyright © 2015 John Wiley & Sons, Ltd.