A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record

medRxiv [Preprint]. 2023 Sep 4:2023.08.31.23294924. doi: 10.1101/2023.08.31.23294924.

Abstract

Importance: Large language models (LLMs) have proven useful for extracting data from publicly available sources, but their uses in clinical settings and with clinical data are unknown.

Objective: To determine the accuracy of data extraction using "Versa Chat," a chat implementation of the general-purpose OpenAI gpt-35-turbo LLM model, versus manual chart review for hepatocellular carcinoma (HCC) imaging reports.

Design: We engineered a prompt for the data extraction task of six distinct data elements and input 182 abdominal imaging reports that were also manually tagged. We evaluated performance by calculating accuracy, precision, recall, and F1 scores.

Setting/participants: Cross-sectional abdominal imaging reports of patients diagnosed with hepatocellular carcinoma enrolled in the Functional Assessment in Liver Transplantation (FrAILT) study.

Publication types

  • Preprint