Can LLMs simplify operative notes? A comparative analysis in otorhinolaryngology

Eur Arch Otorhinolaryngol. 2026 Jan;283(1):477-489. doi: 10.1007/s00405-025-09758-2. Epub 2025 Nov 1.

Abstract

Introduction: Operative notes play a critical role in documenting surgical procedures and supporting medical communication. However, due to their technical language, these documents are often complex and difficult to understand for patients, non-medical individuals, and even some healthcare professionals. Large Language Models (LLMs) offer a novel opportunity to simplify such documents and make them more accessible. This study aims to quantify how six LLMs simplify otolaryngology operative notes and to compare readability, clinical accuracy and clarity.

Materials and methods: In this study, 39 fictional operative notes specific to otolaryngologic surgery were simplified using six LLMs (GPT-4, GPT-4o, Claude 3.7, Gemini 2.0, DeepSeek, and Microsoft Copilot). The outputs were analyzed using eight different readability metrics and evaluated by two expert physicians in terms of medical accuracy and comprehensibility. Correlation analyses were also conducted across clinical subgroups (rhinology, otology, head and neck surgery).

Results: Claude 3.7 produced the most complex outputs, whereas GPT-4o, Gemini, and DeepSeek generated the most readable texts. According to expert evaluations, GPT-4 achieved the highest scores for medical accuracy, while GPT-4o received the highest ratings for clarity. Model performance varied across clinical subgroups.

Conclusion: LLMs are effective tools for simplifying medical texts; however, model selection should consider the target audience and clinical context, and all outputs must be verified by medical experts. When used in a controlled and validated manner, LLMs may contribute significantly to a new era of health communication.

Level of evidence: N/A.

Keywords: Large language models; Medical communication; Operative notes; Readability; Simplification.

Publication types

  • Comparative Study
  • Evaluation Study
  • Comparative Study

MeSH terms

  • Comprehension
  • Data Accuracy
  • Documentation
  • Humans
  • Large Language Models*
  • Medical Records Systems, Computerized*
  • Otolaryngology* / methods
  • Otorhinolaryngologic Surgical Procedures*