Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review

Artif Intell Med. 2023 Dec:146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.

Abstract

Objective: Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care.

Methods: We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs.

Results: Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP.

Conclusions: This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.

Keywords: Electronic health records; Machine learning; Natural language processing; Patient-reported outcomes; Unstructured clinical narrative.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Systematic Review

MeSH terms

  • Electronic Health Records*
  • Humans
  • Machine Learning*
  • Natural Language Processing*
  • Patient Outcome Assessment*
  • PubMed