Comparing ChatGPT's and surgeon's responses to thyroid-related questions from patients

Siyin Guo; Ruicen Li; Genpeng Li; Wenjie Chen; Jing Huang; Linye He; Yu Ma; Liying Wang; Hongping Zheng; Chunxiang Tian; Yatong Zhao; Xinmin Pan; Hongxing Wan; Dasheng sLiu; Zhihui Li; Jianyong Lei

doi:10.1210/clinem/dgae235

Comparing ChatGPT's and surgeon's responses to thyroid-related questions from patients

J Clin Endocrinol Metab. 2024 Apr 10:dgae235. doi: 10.1210/clinem/dgae235. Online ahead of print.

Authors

Siyin Guo¹, Ruicen Li², Genpeng Li¹, Wenjie Chen¹, Jing Huang¹, Linye He¹, Yu Ma¹, Liying Wang¹, Hongping Zheng³, Chunxiang Tian⁴, Yatong Zhao⁵, Xinmin Pan⁶, Hongxing Wan⁷, Dasheng sLiu⁸, Zhihui Li¹, Jianyong Lei¹

Affiliations

¹ Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.
² Health Management Center, General Practice Medical Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.
³ Department of Thyroid Surgery, General Surgery Ward 7, The First Hospital of Lanzhou University, Lanzhou, Gansu 730000, China.
⁴ Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan 610031, China.
⁵ Thyroid Surgery, Zhengzhou Central Hospital Affiliated of Zhengzhou University, Zhengzhou, Henan 450007, China.
⁶ Gansu Provincial People's Hospital, Lanzhou, Gansu 730000, China.
⁷ Department of Oncology, Sanya People's Hospital, Sanya, Hainan 572000, China.
⁸ Department of Vascular Thyroid Surgery, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510120, China.

PMID: 38597169
DOI: 10.1210/clinem/dgae235

Abstract

Background: For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions.

Study design: First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the two interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), junior specialist and senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on four dimensions: accuracy, comprehensiveness, compassion, and satisfaction.

Results: Among the 30 questions and responses, ChatGPT's speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs. 4.33 [4.05-4.60], P <.001) and senior specialist (8.69 [7.53-9.48] vs. 4.22 [3.36-4.76], P <.001). The word count of the ChatGPT's responses was greater than that of both junior specialist (341.50 [301.00-384.25] vs. 74.50 [51.75-84.75], P <0.001) and senior specialist (341.50 [301.00-384.25] vs. 104.00 [63.75-177.75], P <0.001). ChatGPT received higher scores than junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion and satisfaction in responding to common thyroid-related questions.

Conclusions: ChatGPT performed better than junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.

Keywords: Artificial intelligence (AI); GPT-4; Thyroid-related common questions.

© The Author(s) 2024. Published by Oxford University Press on behalf of the Endocrine Society. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.