Accuracy, Clarity, and Comprehensiveness of ChatGPT Outputs for Commonly Asked Questions About Living Kidney Donation

Clin Transplant. 2025 Sep;39(9):e70303. doi: 10.1111/ctr.70303.

Abstract

Introduction: The effectiveness of ChatGPT responses to common living kidney donation (LKD) queries remains unclear.

Methods: We surveyed nephrologists and living kidney donors/candidates to evaluate ChatGPT-3.5's accuracy, comprehensiveness, and clarity in answering common donation questions in English and French. Ratings used a 5-point Likert scale, with percentage agreement and modified Fleiss' Kappa measuring inter-rater consistency.

Results: The evaluation of ChatGPT-3.5's responses varied between nephrologists and kidney donors/candidates. Nephrologists showed moderate percentage agreement for English responses (50%-59%) and poor agreement for French responses (9%-45%). Kidney donors/candidates exhibited high agreement for English (90%-100%) but low for French (0%-77%). Inter-rater agreement among nephrologists was moderate for both English (Kappa 0.74, 95% CI: 0.67, 0.79, p < 0.0001) and French (Kappa 0.70, 95% CI: 0.64, 0.77, p < 0.0001). In contrast, inter-rater agreement was poor among donors/candidates for both English (Kappa -0.10, 95% CI: -0.14, -0.07, p = 0.99) and French (Kappa -0.03, 95% CI: -0.07, 0, p = 0.81).

Conclusion: ChatGPT 3.5's responses to common LKD queries demonstrated limited agreement among nephrologists and kidney donors/donor candidates, highlighting its lack of reliability as a supplement to existing educational materials for living kidney donor programs in English and French.

MeSH terms

  • Adult
  • Female
  • Generative Artificial Intelligence
  • Humans
  • Kidney Transplantation*
  • Living Donors* / psychology
  • Male
  • Middle Aged
  • Nephrectomy*
  • Nephrologists*
  • Prognosis
  • Surveys and Questionnaires / standards
  • Tissue and Organ Procurement*