Developing a Prognostic Model in the Presence of Missing Data: An Ovarian Cancer Case Study

J Clin Epidemiol. 2003 Jan;56(1):28-37. doi: 10.1016/s0895-4356(02)00539-5.


When developing prognostic models in medicine, covariate data are often missing and the standard response is to exclude those individuals whose data are incomplete from the analyses. This practice leads to a reduction in the statistical power, and may lead to biased results. We wished to develop a prognostic model for overall survival from 1,189 primary cases (842 deaths) of epithelial ovarian cancer. A complete case analysis restricted the sample size to 518 (380 deaths). After applying a multiple imputation (MI) framework we included three real values for each one imputed, and constructed a model composed of more statistically significant prognostic factors and with increased predictive ability. Missing values can be imputed in cases where the reason for the data being missing is known, particularly where it can be explained by available data. This will increase the power of an analysis and may produce models that are more statistically reliable and applicable within clinical practice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Data Collection / standards
  • Female
  • Follow-Up Studies
  • Humans
  • Middle Aged
  • Models, Statistical
  • Ovarian Neoplasms / mortality*
  • Prognosis
  • Regression Analysis
  • Sample Size
  • Scotland / epidemiology
  • Survival Analysis