Evaluation of the uniformity of fit of general outcome prediction models

Intensive Care Med. 1998 Jan;24(1):40-7. doi: 10.1007/s001340050513.


Objective: To compare the performance of the New Simplified Acute Physiology Score (SAPS II) and the New Admission Mortality Probability Model (MPM II0) within relevant subgroups using formal statistical assessment (uniformity of fit).

Design: Analysis of the database of a multi-centre, multi-national and prospective cohort study, involving 89 ICUs from 12 European Countries.

Setting: Database of EURICUS-I.

Patients: Data of 16,060 patients consecutively admitted to the ICUs were collected during a period of 4 months. Following the original SAPS II and MPM II0 criteria, the following patients were excluded from the analysis: younger than 18 years of age; readmissions; acute myocardial infarction; burn cases; patients in the post-operative period after coronary artery bypass surgery and patients with a length of stay in the ICU shorter than 8 h, resulting in a total of 10,027 cases.

Interventions: Data necessary for the calculation of SAPS II and MPM II0, basic demographic statistics and vital status on hospital discharge were recorded. Formal evaluation of the performance of the models, comprising discrimination (area under ROC curve), calibration (Hosmer-Lemeshow goodness-of-fit H and C tests) and observed/expected mortality ratios within relevant subgroups.

Main results: Better predictive accuracy was achieved in elective surgery patients admitted from the operative room/post-anaesthesia room with gastrointestinal, neurological or trauma diagnoses, and younger patients with non-operative neurological, septic or trauma diagnoses. All these characteristics appear to be linked to a lower severity of illness, with both models overestimating mortality in the more severely ill patients.

Conclusions: Concerning the performance of the models, very large differences were apparent in relevant subgroups, varying from excellent to almost random predictive accuracy. These differences can explain some of the difficulties of the models to accurately predict mortality when applied to different populations with distinct patient baseline characteristics. This study stresses the importance of evaluating multiple diverse populations (to generate the design set) and of methods to improve the validation set before extrapolations can be made from the validation setting to new independent populations. It also underlines the necessity of a better definition of the patient baseline characteristics in the samples under analysis and the formal statistical evaluation of the application of the models to specific subgroups.

Publication types

  • Multicenter Study

MeSH terms

  • Europe
  • Forecasting / methods*
  • Humans
  • Intensive Care Units / statistics & numerical data*
  • Models, Theoretical*
  • Mortality
  • Prospective Studies
  • Severity of Illness Index*