Using Large Language Models to Generate Educational Materials on Childhood Glaucoma

Qais Dihan; Muhammad Z Chauhan; Taher K Eleiwa; Amr K Hassan; Ahmed B Sallam; Albert S Khouri; Ta C Chang; Abdelrahman M Elhusseiny

doi:10.1016/j.ajo.2024.04.004

Using Large Language Models to Generate Educational Materials on Childhood Glaucoma

Am J Ophthalmol. 2024 Apr 11:S0002-9394(24)00144-2. doi: 10.1016/j.ajo.2024.04.004. Online ahead of print.

Authors

Qais Dihan¹, Muhammad Z Chauhan², Taher K Eleiwa³, Amr K Hassan⁴, Ahmed B Sallam⁵, Albert S Khouri⁶, Ta C Chang⁷, Abdelrahman M Elhusseiny⁸

Affiliations

¹ Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA.
² Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA.
³ Department of Ophthalmology, Benha Faculty of Medicine, Benha University, Benha, Egypt.
⁴ Department of Ophthalmology, Faculty of Medicine, South Valley University, Qena, Egypt.
⁵ Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA; Department of Ophthalmology, Faculty of Medicine, Ain Shams University, Cairo, Egypt.
⁶ Institute of Ophthalmology & Visual Science, Rutgers New Jersey Medical School, Newark, NJ, USA.
⁷ Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL, USA.
⁸ Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA; Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA. Electronic address: AMELhusseiny@uams.edu.

PMID: 38614196
DOI: 10.1016/j.ajo.2024.04.004

Abstract

Purpose: To evaluate the quality, readability, and accuracy of large language model (LLM) generated patient education materials (PEMs) on childhood glaucoma, and their ability to improve existing online information's readability.

Design: Cross-sectional comparative study.

Methods: We evaluated responses of ChatGPT-3.5, ChatGPT-4, and Bard to three separate prompts requesting they write PEMs on "childhood glaucoma." Prompt A required PEMs be "easily understandable by the average American." Prompt B required PEMs be written "at a 6^th-grade level using Simple Measure of Gobbledygook (SMOG) readability formula." We then compared responses' quality (DISCERN questionnaire, Patient Education Materials Assessment Tool (PEMAT)), readability (SMOG, Flesch-Kincaid Grading Level (FKGL)), and accuracy (Likert Misinformation scale). To assess the improvement of readability for existing online information, Prompt C requested LLM rewrite 20 resources from a Google search of keyword "childhood glaucoma" to the American Medical Association-recommended "6^th-grade level." Rewrites were compared on key metrics such as readability, complex words (≥3 syllables), and sentence count.

Results: All 3 LLM generated PEMs that were of high quality, understandability, and accuracy (DISCERN≥4, ≥70% PEMAT understandability, Misinformation score=1). Prompt B responses were more readable than Prompt A responses for all 3 LLM (p≤0.001). ChatGPT-4 generated the most readable PEMs compared to ChatGPT-3.5 and Bard (p≤0.001). Although Prompt C responses showed consistent reduction of mean SMOG and FKGL scores, only ChatGPT-4 achieved the specified 6^th-grade reading level (4.8 ± 0.8 and 3.7 ± 1.9, respectively).

Conclusion: LLMs can serve as strong supplementary tools in generating high quality, accurate, and novel PEMs, and improving the readability of existing PEMs on childhood glaucoma.

Keywords: ChatGPT; Google; childhood glaucoma; glaucoma; large language models; online; pediatric glaucoma; quality; readability, patient education; reliability; websites.