Comparison of artificial intelligence and human-based prediction and stratification of the risk of long-term kidney allograft failure

Gillian Divard; Marc Raynaud; Vasishta S Tatapudi; Basmah Abdalla; Elodie Bailly; Maureen Assayag; Yannick Binois; Raphael Cohen; Huanxi Zhang; Camillo Ulloa; Kamila Linhares; Helio S Tedesco; Christophe Legendre; Xavier Jouven; Robert A Montgomery; Carmen Lefaucheur; Olivier Aubert; Alexandre Loupy

doi:10.1038/s43856-022-00201-9

Comparison of artificial intelligence and human-based prediction and stratification of the risk of long-term kidney allograft failure

Commun Med (Lond). 2022 Nov 23;2(1):150. doi: 10.1038/s43856-022-00201-9.

Authors

Gillian Divard^#^{1

2}, Marc Raynaud^#¹, Vasishta S Tatapudi³, Basmah Abdalla⁴, Elodie Bailly^{1

5}, Maureen Assayag⁶, Yannick Binois⁷, Raphael Cohen⁸, Huanxi Zhang⁹, Camillo Ulloa¹⁰, Kamila Linhares¹¹, Helio S Tedesco¹¹, Christophe Legendre^{1

12}, Xavier Jouven^{1

13}, Robert A Montgomery³, Carmen Lefaucheur^{1

2}, Olivier Aubert^#^{1

12}, Alexandre Loupy^#^{14

15}

Affiliations

¹ Université de Paris Cité, INSERM U970, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France.
² Kidney Transplant Department, Saint-Louis Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France.
³ NYU Langone Transplant Institute, NYU Langone Health, New York, NY, USA.
⁴ Department of Medicine, Division of Nephrology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
⁵ Department of Surgery, Thomas E. Starzl Transplantation Institute, University of Pittsburgh, Medical Center, Pittsburgh, PA, USA.
⁶ Kidney Transplant Department, Bicêtre Hospital, Assistance Publique - Hôpitaux de Paris, Kremlin-Bicêtre, France.
⁷ Medical Intensive Care Unit, Saint-Louis Hospital, Assistance Publique - Hôpitaux de Paris, Paris, France.
⁸ Department of Physiology, Assistance Publique-Hôpitaux de Paris, Hôpital Européen Georges Pompidou, Paris, France.
⁹ The First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
¹⁰ Clinica Alemana de Santiago, Santiago, Chile.
¹¹ Universidade Federal de Sao Paulo, Hospital do Rim, Escola Paulista de Medicina, Sao Paulo, Brazil.
¹² Kidney Transplant Department, Necker Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France.
¹³ Cardiology and Heart Transplant department, Pompidou hospital, Assistance Publique - Hôpitaux de Paris, Paris, France.
¹⁴ Université de Paris Cité, INSERM U970, PARCC, Paris Translational Research Centre for Organ Transplantation, Paris, France. alexandre.loupy@inserm.fr.
¹⁵ Kidney Transplant Department, Necker Hospital, Assistance Publique-Hôpitaux de Paris, Paris, France. alexandre.loupy@inserm.fr.

^# Contributed equally.

Abstract

Background: Clinical decisions are mainly driven by the ability of physicians to apply risk stratification to patients. However, this task is difficult as it requires complex integration of numerous parameters and is impacted by patient heterogeneity. We sought to evaluate the ability of transplant physicians to predict the risk of long-term allograft failure and compare them to a validated artificial intelligence (AI) prediction algorithm.

Methods: We randomly selected 400 kidney transplant recipients from a qualified dataset of 4000 patients. For each patient, 44 features routinely collected during the first-year post-transplant were compiled in an electronic health record (EHR). We enrolled 9 transplant physicians at various career stages. At 1-year post-transplant, they blindly predicted the long-term graft survival with probabilities for each patient. Their predictions were compared with those of a validated prediction system (iBox). We assessed the determinants of each physician's prediction using a random forest survival model.

Results: Among the 400 patients included, 84 graft failures occurred at 7 years post-evaluation. The iBox system demonstrates the best predictive performance with a discrimination of 0.79 and a median calibration error of 5.79%, while physicians tend to overestimate the risk of graft failure. Physicians' risk predictions show wide heterogeneity with a moderate intraclass correlation of 0.58. The determinants of physicians' prediction are disparate, with poor agreement regardless of their clinical experience.

Conclusions: This study shows the overall limited performance and consistency of physicians to predict the risk of long-term graft failure, demonstrated by the superior performances of the iBox. This study supports the use of a companion tool to help physicians in their prognostic judgement and decision-making in clinical care.

Plain language summary

The ability to predict the risk of a particular event is key to clinical decision-making, for example when predicting the risk of a poor outcome to help decide which patients should receive an organ transplant. Computer-based systems may help to improve risk prediction, particularly with the increasing volume and complexity of patient data available to clinicians. Here, we compare predictions of the risk of long-term kidney transplant failure made by clinicians with those made by our computer-based system (the iBox system). We observe that clinicians’ overall performance in predicting individual long-term outcomes is limited compared to the iBox system, and demonstrate wide variability in clinicians’ predictions, regardless of level of experience. Our findings support the use of the iBox system in the clinic to help clinicians predict outcomes and make decisions surrounding kidney transplants.