Comparing ChatGPT's and surgeon's responses to thyroid-related questions from patients

J Clin Endocrinol Metab. 2024 Apr 10:dgae235. doi: 10.1210/clinem/dgae235. Online ahead of print.

Abstract

Background: For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions.

Study design: First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the two interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), junior specialist and senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on four dimensions: accuracy, comprehensiveness, compassion, and satisfaction.

Results: Among the 30 questions and responses, ChatGPT's speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs. 4.33 [4.05-4.60], P <.001) and senior specialist (8.69 [7.53-9.48] vs. 4.22 [3.36-4.76], P <.001). The word count of the ChatGPT's responses was greater than that of both junior specialist (341.50 [301.00-384.25] vs. 74.50 [51.75-84.75], P <0.001) and senior specialist (341.50 [301.00-384.25] vs. 104.00 [63.75-177.75], P <0.001). ChatGPT received higher scores than junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion and satisfaction in responding to common thyroid-related questions.

Conclusions: ChatGPT performed better than junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.

Keywords: Artificial intelligence (AI); GPT-4; Thyroid-related common questions.