Some issues in estimating the effect of prognostic factors from incomplete covariate data

Stat Med. 1997;16(1-3):57-72. doi: 10.1002/(sici)1097-0258(19970115)16:1<57::aid-sim471>;2-s.


In evaluating prognostic factors by means of regression models, missing values in the covariate data are a frequent complication. There exist statistical tools to analyse such incomplete data in an efficient manner, and in this paper we make use of the traditional maximum likelihood principle. As well as an analysis including the incompletely measured covariates, such tools also allow further strategies of data analysis. For example, we can use surrogate variables to improve the prediction of missing values or we can try to investigate a questionable "missing at random' assumption. We discuss these techniques using the example of a clinical study where one important covariate is missing for about half the subjects. Additionally we consider two further issues: evaluation of differences between estimates from a complete case analysis and analyses using all subjects and assessment of the predictive value of missing values.

MeSH terms

  • Data Interpretation, Statistical
  • Humans
  • Likelihood Functions
  • Models, Statistical*
  • Neoplasm, Residual / diagnosis
  • Neoplasm, Residual / surgery
  • Predictive Value of Tests
  • Prognosis
  • Randomized Controlled Trials as Topic / methods*
  • Regression Analysis
  • Sensitivity and Specificity
  • Stochastic Processes
  • Tomography, X-Ray Computed