Generalizability and portability of natural language processing system to extract individual social risk factors
- PMID: 37302362
- PMCID: PMC11164320
- DOI: 10.1016/j.ijmedinf.2023.105115
Generalizability and portability of natural language processing system to extract individual social risk factors
Abstract
Objective: The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution.
Materials and methods: A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated.
Results: More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors.
Discussion: Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors.
Conclusion: Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.
Keywords: Generalizability; Natural language processing; Portability; Rule-based; Social risk factors.
Copyright © 2023 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures
Similar articles
-
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177. J Am Med Inform Assoc. 2024. PMID: 39001795
-
Natural language processing-driven state machines to extract social factors from unstructured clinical documentation.JAMIA Open. 2023 Apr 18;6(2):ooad024. doi: 10.1093/jamiaopen/ooad024. eCollection 2023 Jul. JAMIA Open. 2023. PMID: 37081945 Free PMC article.
-
Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records.Health Serv Res. 2023 Dec;58(6):1292-1302. doi: 10.1111/1475-6773.14210. Epub 2023 Aug 3. Health Serv Res. 2023. PMID: 37534741 Free PMC article.
-
Malnutrition and its contributing factors for older people living in residential aged care facilities: Insights from natural language processing of aged care records.Technol Health Care. 2023;31(6):2267-2278. doi: 10.3233/THC-230229. Technol Health Care. 2023. PMID: 37302059 Review.
-
Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review.JAMIA Open. 2024 May 24;7(2):ooae044. doi: 10.1093/jamiaopen/ooae044. eCollection 2024 Jul. JAMIA Open. 2024. PMID: 38798774 Free PMC article. Review.
Cited by
-
A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 37680211 Free PMC article.
References
-
- Pak HS. Unstructured data in healthcare. Available from https://artificial-intelligence.healthcaretechoutlook.com/cxoinsights/un... [Accessed 11-10-2022]
-
- Yim WW, Yetisgen M, Harris WP, et al. Natural Language Processing in Oncology: A Review. JAMA Oncol. 2016;2(6):797. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
