Multiple imputation validation study: addressing unmeasured survey data in a longitudinal design

BMC Med Res Methodol. 2021 Jan 6;21(1):5. doi: 10.1186/s12874-020-01158-w.


Background: Questionnaires used in longitudinal studies may have questions added or removed over time for numerous reasons. Data missing completely at a follow-up survey is a unique issue for longitudinal studies. While such excluded questions lack information at one follow-up survey, they are collected at other follow-up surveys, and covariances observed at other follow-up surveys may allow for the recovery of the missing data. This study utilized data from a large longitudinal cohort study to assess the efficiency and feasibility of using multiple imputation (MI) to recover this type of information.

Methods: Millennium Cohort Study participants completed the 9-item Patient Health Questionnaire (PHQ) depression module at 2 time points (2004, 2007). The suicidal ideation item in the module was set to missing for the 2007 assessment. Several single-level MI models using different sets of predictors and forms of suicidal ideation were used to compare self-reported values and imputed values for this item in 2007. Additionally, associations with sleep duration and smoking status, which are related constructs, were compared between self-reported and imputed values of suicidal ideation.

Results: Among 63,028 participants eligible for imputation analysis, 4.05% reported suicidal ideation on the 2007 survey. The imputation models successfully identified suicidal ideation, with a sensitivity ranging between 34 and 66% and a positive predictive value between 36 and 42%. Specificity remained above 96% and negative predictive value above 97% for all imputed models. Similar associations were found for all imputation models on related constructs, though the dichotomous suicidal ideation imputed from the model using only PHQ depression items yielded estimates that were closest with the self-reported associations for all adjusted analyses.

Conclusions: Although sensitivity and positive predictive value were relatively low, applying MI techniques allowed for inclusion of an otherwise missing variable. Additionally, correlations with related constructs were estimated near self-reported values. Therefore, the other 8 depression items can be used to estimate suicidal ideation that was completely missing from a survey using MI. However, these imputed values should not be used to estimate population prevalence.

Keywords: Cohort study; Longitudinal data; Major depressive disorder; Multiple imputation; Patient health questionnaire; Suicidal ideation; Survey data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cohort Studies
  • Depression* / diagnosis
  • Depression* / epidemiology
  • Humans
  • Longitudinal Studies
  • Suicidal Ideation*
  • Surveys and Questionnaires