Comparison of Natural Language Processing Techniques in Analysis of Sparse Clinical Data: Insulin Decline by Patients

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:610-619. eCollection 2019.


We present a comparative evaluation of a range of popular Natural Language Processing (NLP) approaches for Information Extraction (IE) in clinical documents to detect cases of patients declining medication that has been recommended by their providers. More specifically, we tackle the task of identifying diabetics who decline insulin, using a training set of 51k randomly selected provider notes. Analysis shows that decline of insulin by patients is a rare phenomenon, with a document-level prevalence of approx. 0.1%. We examine the effectiveness of some of the most popular IE approaches, including sentence-level support vector machines (SVM)-based classification, token- level sequence labelling using conditional random fields (CRFs), and rule-based detection based on encoding human knowledge. Our results on a held-out test set show that the generalization of rule-based approach (F1=0.97) outperforms the SVM (F1=0.61) and CRF models (F1=0.40).