Non-parametric approach for frequentist multiple imputation in survival analysis with missing covariates

Stat Methods Med Res. 2021 Jul;30(7):1691-1707. doi: 10.1177/09622802211011197. Epub 2021 Jun 10.


In clinical and epidemiological studies using survival analysis, some explanatory variables are often missing. When this occurs, multiple imputation (MI) is frequently used in practice. In many cases, simple parametric imputation models are routinely adopted without checking the validity of the model specification. Misspecified imputation models can cause biased parameter estimates. In this study, we describe novel frequentist type MI procedures for survival analysis using proportional and additive hazards models. The procedures are based on non-parametric estimation techniques and do not require the correct specification of parametric imputation models. For continuous missing covariates, we first sample imputation values from a parametric imputation model. Then, we obtain estimates by solving the estimating equation modified by non-parametrically estimated conditional densities. For categorical missing covariates, we directly sample imputation values from a non-parametrically estimated conditional distribution and then obtain estimates by solving the corresponding estimating equation. We evaluate the performance of the proposed procedures using simulation studies: one uses simulated data; another uses data informed by parameters generated from a real-world medical claims database. We also applied the procedures to a pharmacoepidemiological study that examined the effect of antihyperlipidemics on hyperglycemia incidence.

Keywords: Density ratio estimation; hazards model; missing data analysis; model misspecification; non-parametric estimation; observational study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Factual
  • Models, Statistical*
  • Survival Analysis