Clinical concept extraction: A methodology review

J Biomed Inform. 2020 Sep:109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.


Background: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement.

Objectives: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications.

Methods: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library.

Results: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.

Keywords: Concept extraction; Deep learning; Electronic health records; Information extraction; Machine learning; Natural language processing.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Bibliometrics
  • Information Storage and Retrieval*
  • Natural Language Processing*
  • Research Design