Speaking Patient's Language: Assessment of Readability and Fidelity of Artificial Intelligence-Optimized Consent Forms

J Surg Res. 2026 Jun:322:317-325. doi: 10.1016/j.jss.2026.03.079. Epub 2026 Apr 15.

Abstract

Introduction: Informed consent requires patients to fully comprehend the risks, benefits, and alternatives of an intervention. The American Medical Association and the National Institutes of Health recommend patient-facing materials be written at a sixth-grade level or lesser. We evaluated baseline readability of informed consents used within the endocrine surgery division of a tertiary care center and determined whether rewriting them with a large language model (LLM)-based chatbot can bring the text to the recommended level while preserving fidelity.

Methods: Eight consent forms (two institutional procedural forms and six prospective trial documents) underwent readability assessment. Each form was processed by the LLM in two separate, independent sessions. Pre- and postedit readability scores were compared. Three independent reviewers assessed content fidelity by calculating precision, recall, and F1 scores (harmonic mean balancing precision and recall). Inter-rater reliability was evaluated using the intraclass correlation coefficient.

Results: Original forms averaged 14.1 ± 1.3 grade levels. First LLM revision significantly improved readability to an 8.8 ± 1.2 grade level (P < 0.01), a five-grade reduction. Second LLM revision showed no further improvement (9.9 ± 1.2; P = 0.87). The mean F1 score was 0.71 ± 0.26, with high precision (0.95 ± 0.06) but lower recall (0.62 ± 0.16), indicating few hallucinations but frequent content omissions. Greater reductions in reading level were significantly associated with decreased content fidelity (r = 0.73, P < 0.01). Inter-rater agreement was excellent (K = 0.99, P < 0.01).

Conclusions: LLM-based editing significantly improved consent form readability but resulted in substantial content omissions. These findings demonstrate LLM's potential for advancing health literacy while highlighting the critical need for human review to ensure completeness and fidelity.

Keywords: Health literacy; Informed consent; Large language models; Readability.

MeSH terms

  • Artificial Intelligence*
  • Comprehension*
  • Consent Forms* / standards
  • Health Literacy*
  • Humans
  • Informed Consent* / standards
  • Language*
  • Prospective Studies