Purpose: This study aims to evaluate the performance of three large language models (LLMs) in generating patient education materials (PEMs) for thyroid eye disease (TED), intending to improve patients' understanding and awareness of TED.
Methods: We evaluated the performance of ChatGPT-4o, Claude 3.5, and Gemini 1.5 in generating PEMs for TED by designing different prompts. First, we produced TED patient educational brochures based on prompts A and B, respectively. Prompt B asked to make the content simple for sixth graders. Next, we designed two responses to frequently asked questions (FAQs) about TED: standard responses and simplified responses, where the simplified responses were optimized through specific prompts. All generated content was systematically evaluated based on dimensions such as quality, understandability, actionability, accuracy, and empathy. The readability of the content was analyzed using the online tool Readable.com (including FKGL: Flesch-Kincaid Grade Level and SMOG: Simple Measure of Gobbledygook).
Results: Both prompt A and prompt B generated brochures that performed excellently in terms of quality (DISCERN ≥ 4), understandability (PEMAT Understandability ≥70%), accuracy (Score ≥4), and empathy (Score ≥4), with no significant differences between the two. However, both failed to meet the "actionable" standard (PEMAT Actionability <70%). Regarding readability, prompt B was easier to understand than prompt A, although the optimized version of prompt B still did not reach the ideal readability level. Additionally, a comparative analysis of FAQs about TED on Google using LLMs showed that, regardless of whether the response was standard or simplified, the LLM's performance outperformed Google, yielding results similar to those generated by the brochures.
Conclusion: Overall, LLMs, as a powerful tool, demonstrate significant potential in generating PEMs for TED. They are capable of producing high-quality, understandable, accurate, and empathetic content, but there is still room for improvement in terms of readability.
Keywords: Artificial intelligence in healthcare; Large language models; Patient education materials; Thyroid eye disease.
© 2025. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.