Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep:177:105115.
doi: 10.1016/j.ijmedinf.2023.105115. Epub 2023 Jun 5.

Generalizability and portability of natural language processing system to extract individual social risk factors

Affiliations

Generalizability and portability of natural language processing system to extract individual social risk factors

Tanja Magoc et al. Int J Med Inform. 2023 Sep.

Abstract

Objective: The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution.

Materials and methods: A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated.

Results: More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors.

Discussion: Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors.

Conclusion: Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.

Keywords: Generalizability; Natural language processing; Portability; Rule-based; Social risk factors.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.
nDepth notes processing steps. Notes: 1. nDepth engine takes a set of notes in JavaScript Object Notation (JSON) format and processes them in parallel. 2. Solr is an open-source Apache software that indexes the input and enables quick search for required phrases or patterns. 3. The ‘accept’ state indicates a note as ‘positive’ for a specific social factor; otherwise, 4. The ‘reject’ state indicates a note as ‘negative’.
Figure 2.
Figure 2.
Porting and generalizing to UF.
Figure 3.
Figure 3.
Examples of text that indicates financial insecurity (the first three examples) and housing instability (the last three examples).
Figure 4.
Figure 4.
An example of template text that is pulled into a note from structured fields in the EHR. Even though this text shows that the person has financial strain, it is ignored by manual annotators and the NLP model since this data already exists in structured format, and the purpose of the NLP model is to supplement readily available information.
Figure 5.
Figure 5.
The percent of top note types classified as positive by NLP model for each social factor category.

Similar articles

Cited by

References

    1. Pak HS. Unstructured data in healthcare. Available from https://artificial-intelligence.healthcaretechoutlook.com/cxoinsights/un... [Accessed 11-10-2022]
    1. Yim WW, Yetisgen M, Harris WP, et al. Natural Language Processing in Oncology: A Review. JAMA Oncol. 2016;2(6):797. - PubMed
    1. Reading Turchioe M, Volodarskiy A, Pathak J, et al. Systematic review of current natural language processing methods and applications in cardiology. Heart. 2022;108(12):909–16. - PMC - PubMed
    1. Le Glaz A, Haralambous Y, Kim-Dufor DH, et al. Machine Learning and Natural Language Processing in Mental Health: Systematic Review. J Med Internet Res. 2021;23(5):e15708. - PMC - PubMed
    1. Savova GK, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. Journal of Medical Informatics Association 2010;17:507–513. - PMC - PubMed

Publication types

LinkOut - more resources