Accurately recording a patient's medical conditions in an EHR system is the basis of effectively documenting patient health status, coding for billing, and supporting data-driven clinical decision making. However, patient conditions are often not fully captured in structured EHR systems, but may be documented in unstructured clinical notes. The challenge is that not all disease mentions in clinical notes actually refer to a patient's conditions. We developed a two-step workflow for identifying patient's conditions from clinical notes: disease mention extraction and disease mention classification. We implemented this workflow in a prototype system, DI++, for Disease Identification. An advanced deep learning model, CLSTM-Attention model, is developed for disease mention classification in DI++. Extensive empirical evaluation on about one million pages of de-identified clinical notes demonstrates that DI++ has significant performance advantage over existing systems on F1 Score, Area Under the Curve metrics, and efficiency. The proposed CLSTM-Attention model outperforms the existing deep learning models for disease mention classification.
Keywords: Clinical notes; Concept extraction; Deep learning; Deep neural network; Disease mention extraction; Natural language processing (NLP); Patient condition classification.
Copyright © 2021. Published by Elsevier B.V.