Introduction: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer.
Methods: Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported.
Results: Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%-100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%-94%; Manual accuracy: 78%-94%; concordance: 71%-82%). Concurrent medications (86%-100%) and comorbid conditions (96%-100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%-100%) and highly concordant (83%-99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%-98%), although detection of biomarker test results was more variable (accuracy 84%-100%, concordance 84%-99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%-99%; manual: 71%-100%; concordance: 70%-99%) with the exception of lung and lymph node metastases (NLP: 66%-71%; manual: 87%-92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition.
Conclusions: Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction.
Keywords: Artificial intelligence; Health records; Natural language processing; Real-world data; Real-world evidence; Validation.
© 2022 The Authors.