Background: Social isolation is an important social determinant that impacts health outcomes and mortality among patients. The National Academy of Medicine recently recommended that social isolation be documented in electronic health records (EHR). However, social isolation usually is not recorded or obtained as coded data but rather collected from patient self-report or documented in clinical narratives. This study explores the feasibility and effectiveness of natural language processing (NLP) strategy for identifying patients who are socially isolated from clinical narratives.
Method: We used data from the Medical University of South Carolina (MUSC) Research Data Warehouse. Patients 18 years-of-age or older who were diagnosed with prostate cancer between January 1, 2014 and May 31, 2017 were eligible for this study. NLP pipelines identifying social isolation were developed via extraction of notes on progress, history and physical, consult, emergency department provider, telephone encounter, discharge summary, plan of care, and radiation oncology. Of 4195 eligible prostate cancer patients, we randomly sampled 3138 patients (75%) as a training dataset. The remaining 1057 patients (25%) were used as a test dataset to evaluate NLP algorithm performance. Standard performance measures for the NLP algorithm, including precision, recall, and F-measure, were assessed by expert manual review using the test dataset.
Results: A total of 55,516 clinical notes from 3138 patients were included to develop the lexicon and NLP pipelines for social isolation. Of those, 35 unique patients (1.2%) had social isolation mention(s) in 217 notes. Among 24 terms relevant to social isolation, the most prevalent were "lack of social support," "lonely," "social isolation," "no friends," and "loneliness". Among 1057 patients in the test dataset, 17 patients (1.6%) were identified as having social isolation mention(s) in 40 clinical notes. Manual review identified four false positive mentions of social isolation and one false negatives in 154 notes from randomly selected 52 controls. The NLP pipeline demonstrated 90% precision, 97% recall, and 93% F-measure. The major reasons for a false positive included the ambiguities of the experiencer of social isolation, negation, and alternate meaning of words.
Conclusions: Our NLP algorithms demonstrate a highly accurate approach to identify social isolation.