Objective: To assess and compare the performance of five severity of illness scoring systems used commonly for intensive care unit (ICU) patients in the United Kingdom. The five models analyzed were versions II and III of the Acute Physiology and Chronic Health Evaluation (APACHE) system, a version of APACHE II using United Kingdom (UK)-derived coefficients (UK APACHE II), version II of the Simplified Acute Physiology Score (SAPS), and version II of the Mortality Probability Model, computed at admission (MPM0) and after 24 hrs in the ICU (MPM24).
Design: A 2-yr prospective cohort study of consecutive admissions to intensive care units.
Setting: A total of 22 general ICUs in Scotland
Patients: A total of 13,291 admissions to the study, which after prospectively agreed exclusions left a total of 10,393 patients for the analysis.
Outcome measures: Death or survival at hospital discharge.
Measurements and main results: All the models showed reasonable discrimination using the area under the receiver operating characteristic curve (APACHE III, 0.845; APACHE II, 0.805; UKAPACHE II, 0.809; SAPS II, 0.843; MPM0, 0.785; MPM24, 0.799). The levels of observed mortality were significantly different than that predicted by all models, using the Hosmer-Lemeshow goodness-of-fit C test (p < .001), with the results of the test being confirmed by calibration curves. When excluding patients discharged in the first 24 hrs to allow for comparisons using the same patient group, APACHE III, MPM24, and SAPS II (APACHE III, 0.795; MPM24, 0.791; SAPS II, 0.784) showed significantly better discrimination than APACHE II, UK APACHE II, and MPM0 (APACHE II, 0.763; UK APACHE II, 0.756; MPM0, 0.741). However, calibration changed little for all models with observed mortality still significantly different from that predicted by the scoring systems (p < .001). For equivalent data sets, APACHE II demonstrated superior calibration to all the models using the chi-squared value from the Hosmer-Lemeshow test for both populations (APACHE III, 366; APACHE II, 67; UKAPACHE II, 237; SAPS II, 142; MPM0, 452; MPM24, 101).
Conclusions: SAPS II demonstrated the best overall performance, but the superior calibration of APACHE II makes it the most appropriate model for comparisons of mortality rates in different ICUs. The significance of the Hosmer-Lemeshow C test in all the models suggest that new logistic regression coefficients should be generated and the systems retested before they could be used with confidence in Scottish ICUs.