Simplifying a prognostic model: a simulation study based on clinical data

Stat Med. 2002 Dec 30;21(24):3803-22. doi: 10.1002/sim.1422.


Prognostic models are designed to predict a clinical outcome in individuals or groups of individuals with a particular disease or condition. To avoid bias many researchers advocate the use of full models developed by prespecifying predictors. Variable selection is not employed and the resulting models may be large and complicated. In practice more parsimonious models that retain most of the prognostic information may be preferred. We investigate the effect on various performance measures, including mean square error and prognostic classification, of three methods for estimating full models (including penalized estimation and Tibshirani's lasso) and consider two methods (backwards elimination and a new proposal called stepdown) for simplifying full models. Simulation studies based on two medical data sets suggest that simplified models can be found that perform nearly as well as, or sometimes even better than, full models. Optimizing the Akaike information criterion appears to be appropriate for choosing the degree of simplification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aortic Aneurysm, Abdominal / mortality
  • Aortic Aneurysm, Abdominal / surgery
  • Breast Neoplasms / mortality
  • Disease-Free Survival
  • Female
  • Humans
  • Likelihood Functions*
  • Models, Biological*
  • Risk Factors
  • Survival Analysis