A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record

Jin Ge; Michael Li; Molly B Delk; Jennifer C Lai

doi:10.1101/2023.08.31.23294924

A comparison of large language model versus manual chart review for extraction of data elements from the electronic health record

medRxiv [Preprint]. 2023 Sep 4:2023.08.31.23294924. doi: 10.1101/2023.08.31.23294924.

Authors

Jin Ge¹, Michael Li¹, Molly B Delk², Jennifer C Lai¹

Affiliations

¹ Division of Gastroenterology and Hepatology, Department of Medicine, University of California - San Francisco, San Francisco, CA.
² Section of Gastroenterology and Hepatology, Department of Medicine, Tulane University School of Medicine, New Orleans, LA.

Abstract

Importance: Large language models (LLMs) have proven useful for extracting data from publicly available sources, but their uses in clinical settings and with clinical data are unknown.

Objective: To determine the accuracy of data extraction using "Versa Chat," a chat implementation of the general-purpose OpenAI gpt-35-turbo LLM model, versus manual chart review for hepatocellular carcinoma (HCC) imaging reports.

Design: We engineered a prompt for the data extraction task of six distinct data elements and input 182 abdominal imaging reports that were also manually tagged. We evaluated performance by calculating accuracy, precision, recall, and F1 scores.

Setting/participants: Cross-sectional abdominal imaging reports of patients diagnosed with hepatocellular carcinoma enrolled in the Functional Assessment in Liver Transplantation (FrAILT) study.

Publication types

Preprint

Abstract

Publication types

Grants and funding