Objective: To improve the accuracy of the Acute Physiology and Chronic Health Evaluation (APACHE) method for predicting hospital mortality among critically ill adults and to evaluate changes in the accuracy of earlier APACHE models.
Design: : Observational cohort study.
Setting: A total of 104 intensive care units (ICUs) in 45 U.S. hospitals.
Patients: A total of 131,618 consecutive ICU admissions during 2002 and 2003, of which 110,558 met inclusion criteria and had complete data.
Interventions: None.
Measurements and main results: We developed APACHE IV using ICU day 1 information and a multivariate logistic regression procedure to estimate the probability of hospital death for randomly selected patients who comprised 60% of the database. Predictor variables were similar to those in APACHE III, but new variables were added and different statistical modeling used. We assessed the accuracy of APACHE IV predictions by comparing observed and predicted hospital mortality for the excluded patients (validation set). We tested discrimination and used multiple tests of calibration in aggregate and for patient subgroups. APACHE IV had good discrimination (area under the receiver operating characteristic curve = 0.88) and calibration (Hosmer-Lemeshow C statistic = 16.9, p = .08). For 90% of 116 ICU admission diagnoses, the ratio of observed to predicted mortality was not significantly different from 1.0. We also used the validation data set to compare the accuracy of APACHE IV predictions to those using APACHE III versions developed 7 and 14 yrs previously. There was little change in discrimination, but aggregate mortality was systematically overestimated as model age increased. When examined across disease, predictive accuracy was maintained for some diagnoses but for others seemed to reflect changes in practice or therapy.
Conclusions: APACHE IV predictions of hospital mortality have good discrimination and calibration and should be useful for benchmarking performance in U.S. ICUs. The accuracy of predictive models is dynamic and should be periodically retested. When accuracy deteriorates they should be revised and updated.