Background: Heart transplant programs and regulatory entities require highly accurate performance metrics to support internal quality improvement activities and national oversight of transplant programs, respectively. We assessed the accuracy of publicly reported performance measures.
Methods: We used the United Network for Organ Sharing registry to study patients who underwent heart transplantation between January 1, 2016 and June 30, 2018. We used tests of calibration to compare the observed rate of 1-year graft failure to the expected risk of 1-year graft failure, which was calculated for each recipient using the July 2019 method published by the Scientific Registry of Transplant Recipients (SRTR). The primary study outcome was the joint test of calibration, which accounts for both the total number of events predicted (calibration-in-the-large) and dispersion of risk predictions (calibration slope).
Results: 6,528 heart transplants were analyzed. The primary test of calibration failed (p <0.0001), indicating poor accuracy of the SRTR model. The calibration-in-the-large statistic (0.63, 95% confidence interval [CI] 0.58-0.68, p < 0.0001) demonstrated overestimation of event rates while the calibration slope statistic (0.56, 95% CI 0.49-0.62, p <0.0001) indicated over-dispersion of event rates. Pre-specified subgroup analyses demonstrated poor calibration for all subgroups (each p <0.01). After recalibration, program-level observed/expected ratios increased by a median of 0.14 (p <0.0001).
Conclusions: Risk models employed for publicly-reported graft survival at U.S. heart transplant centers lack accuracy in general and in all subgroups tested. The use of disease-specific models may improve the accuracy of program performance metrics.
Keywords: heart transplant; model calibration; risk model.
Copyright © 2021 International Society for Heart and Lung Transplantation. Published by Elsevier Inc. All rights reserved.