Delirium Identification from Nursing Reports Using Large Language Models

Stud Health Technol Inform. 2025 May 15:327:886-887. doi: 10.3233/SHTI250492.

Abstract

This study investigates large language models for delirium detection from nursing reports, comparing keyword matching, prompting, and finetuning. Using a manually labelled dataset from the University Hospital Freiburg, Germany, we tested Llama3 and Phi3 models. Both prompting and finetuning were effective, with finetuning Phi3 (3.8B) achieving the highest accuracy (90.24%) and AUROC (96.07%), significantly outperforming other methods.

Keywords: Delirium; Electronic Health Records; Large Language Models.

MeSH terms

  • Data Mining* / methods
  • Delirium* / classification
  • Delirium* / diagnosis
  • Delirium* / nursing
  • Diagnosis, Computer-Assisted* / methods
  • Electronic Health Records*
  • Germany
  • Humans
  • Large Language Models
  • Natural Language Processing*
  • Nursing Diagnosis* / methods
  • Nursing Records* / classification
  • Reproducibility of Results
  • Sensitivity and Specificity