Importance: Quality improvement platforms commonly use risk-adjusted morbidity and mortality to profile hospital performance. However, given small hospital caseloads and low event rates for some procedures, it is unclear whether these outcomes reliably reflect hospital performance.
Objective: To determine the reliability of risk-adjusted morbidity and mortality for hospital performance profiling using clinical registry data.
Design, setting, and participants: A retrospective cohort study was conducted using data from the American College of Surgeons National Surgical Quality Improvement Program, 2009. Participants included all patients (N = 55,466) who underwent colon resection, pancreatic resection, laparoscopic gastric bypass, ventral hernia repair, abdominal aortic aneurysm repair, and lower extremity bypass.
Main outcomes and measures: Outcomes included risk-adjusted overall morbidity, severe morbidity, and mortality. We assessed reliability (0-1 scale: 0, completely unreliable; and 1, perfectly reliable) for all 3 outcomes. We also quantified the number of hospitals meeting minimum acceptable reliability thresholds (>0.70, good reliability; and >0.50, fair reliability) for each outcome.
Results: For overall morbidity, the most common outcome studied, the mean reliability depended on sample size (ie, how high the hospital caseload was) and the event rate (ie, how frequently the outcome occurred). For example, mean reliability for overall morbidity was low for abdominal aortic aneurysm repair (reliability, 0.29; sample size, 25 cases per year; and event rate, 18.3%). In contrast, mean reliability for overall morbidity was higher for colon resection (reliability, 0.61; sample size, 114 cases per year; and event rate, 26.8%). Colon resection (37.7% of hospitals), pancreatic resection (7.1% of hospitals), and laparoscopic gastric bypass (11.5% of hospitals) were the only procedures for which any hospitals met a reliability threshold of 0.70 for overall morbidity. Because severe morbidity and mortality are less frequent outcomes, their mean reliability was lower, and even fewer hospitals met the thresholds for minimum reliability.
Conclusions and relevance: Most commonly reported outcome measures have low reliability for differentiating hospital performance. This is especially important for clinical registries that sample rather than collect 100% of cases, which can limit hospital case accrual. Eliminating sampling to achieve the highest possible caseloads, adjusting for reliability, and using advanced modeling strategies (eg, hierarchical modeling) are necessary for clinical registries to increase their benchmarking reliability.