Correction of bias from non-random missing longitudinal data using auxiliary information

Stat Med. 2010 Mar 15;29(6):671-9. doi: 10.1002/sim.3821.


Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood-based joint modeling approach is consistent and most efficient when correctly specified. However, mis-specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis-specified, though it is more robust to mis-specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bias*
  • Longitudinal Studies*
  • Models, Statistical