With the rapid growth and widespread application of electronic health records (EHRs), similar patient retrieval has become an important task for downstream clinical decision support such as diagnostic reference, treatment planning, etc. However, the high dimensionality, large volume, and heterogeneity of EHRs pose challenges to the efficient and accurate retrieval of patients with similar medical conditions to the current case. Several previous studies have attempted to alleviate these issues by using hash coding techniques, improving retrieval efficiency but merely exploring underlying characteristics among instances to preserve retrieval accuracy. In this paper, drug categories of instances recorded in EHRs are regarded as the ground truth to determine the pairwise similarity, and we consider the abundant semantic information within such multi-labels and propose a novel framework named Graph-guided Deep Hashing Networks (GDHN). To capture correlation dependencies among the multi-labels, we first construct a label graph where each node represents a drug category, then a graph convolution network (GCN) is employed to derive the multi-label embedding of each instance. Thus, we can utilize the learned multi-label embeddings to guide the patient hashing process to obtain more informative and discriminative hash codes. Extensive experiments have been conducted on two datasets, including a real-world dataset concerning IgA nephropathy from Peking University First Hospital, and a publicly available dataset from MIMIC-III, compared with traditional hashing methods and state-of-the-art deep hashing methods using three evaluation metrics. The results demonstrate that GDHN outperforms the competitors at different hash code lengths, validating the superiority of our proposal.
Keywords: Deep hashing; Electronic health records; Graph neural networks; Patient representation learning; Similar patient retrieval.
Copyright © 2023 Elsevier Ltd. All rights reserved.