Assessing the Accuracy of Artificial Intelligence-Generated Clinical Summaries From Ambulatory Glaucoma Subspecialty Clinical Encounters

Transl Vis Sci Technol. 2026 Jan 5;15(1):22. doi: 10.1167/tvst.15.1.22.

Abstract

Purpose: The purpose of this study was to evaluate the accuracy of large language model (LLM) LLaMA 2-70B in summarizing glaucoma clinic notes into patient-friendly language and generating educational material.

Methods: A random sample of 147 clinic notes from unique patients who visited Glaucoma Service at a tertiary center was analyzed. LLaMA 2 generated paragraph and bullet-point summaries in five subjects: (1) glaucoma diagnosis and type, (2) disease progression, (3) treatment plan, (4) treatment changes, and (5) surgical/laser interventions. Two ophthalmologists reviewed responses for accuracy and categorized them as "correct," "partially correct," or "incorrect." Discrepancies were adjudicated by a glaucoma specialist. A comparison using identical prompts was performed on a subset (n = 50) with ChatGPT-4.

Results: LLaMA 2 correctly summarized 97 notes (66%) in paragraph and 103 (70%) in bullet format. Another 44 (30%) and 41 (28%) were partially correct, respectively. Paragraph summaries were more accurate and complete for glaucoma suspects than diagnosed patients (82% vs. 53%, P < 0.001). For targeted clinical questions, LLaMA 2 accurately identified glaucoma diagnosis in 118 notes (80%), disease stability/progression in 129 (88%), treatment plans in 127 (87%), treatment changes in 134 (91%), and surgical/laser interventions in 124 (84%). ChatGPT-4 achieved 46% correct paragraph summaries, 50% correct bullet summaries, and accuracies of 96%, 88%, 64%, 78%, and 82%, respectively, for targeted questions.

Conclusions: Although LLaMA 2 is not yet reliable as a standalone clinical tool, it shows promise to improve clinical communication.

Translation relevance: LLMs may enhance patient experience and health literacy by standardizing patient-friendly language in clinical care.

MeSH terms

  • Aged
  • Artificial Intelligence*
  • Electronic Health Records*
  • Female
  • Glaucoma* / diagnosis
  • Glaucoma* / therapy
  • Humans
  • Male
  • Middle Aged