Introduction: Patients increasingly use large language models for health-related information, but their reliability and usefulness remain controversial. Continuous assessment is essential to evaluate their role in patient education. This study evaluates the performance of ChatGPT-3.5 and Gemini in answering patient inquiries about endodontic pain.
Methods: A total of 62 frequently asked questions on endodontic pain were categorized into etiology, symptoms, management, and incidence. Responses from ChatGPT 3.5 and Gemini were assessed using standardized tools, including the Global Quality Score (GQS), Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance reliability tool, and readability indices (Flesch-Kincaid and Simple Measure of Gobbledygook).
Results: Compared to Gemini, ChatGPT 3.5 responses scored significantly higher in terms of overall quality (GQS: 4.67-4.9 vs 2.5-4, P < .001) and reliability (Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance: 23.5-23.6 vs 19.35-22.7, P < .05). However, it required a higher reading level (Simple Measure of Gobbledygook: 14-17.6) compared to Gemini (8.7-11.3, P < .001). Gemini's responses were more readable (6th-7th grade level) but lacked depth and completeness.
Conclusion: While ChatGPT 3.5 outperformed Gemini in quality and reliability, its complex language reduced accessibility. In contrast, Gemini's simpler language enhanced readability but sacrificed comprehensiveness. These findings highlight the need for professional oversight in integrating artificial intelligence-driven tools into healthcare communication to ensure accurate, accessible, and empathetic patient education.
Keywords: ChatGPT 3.5; Gemini; endodontic pain; large language models (LLMs); patient education.
Copyright © 2025 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.