An interpretable deep learning framework for predicting liver metastases in postoperative colorectal cancer patients using natural language processing and clinical data integration

Jia Li; Xinghao Wang; Linkun Cai; Jing Sun; Zhenghan Yang; Wenjuan Liu; Zhenchang Wang; Han Lv

doi:10.1002/cam4.6523

An interpretable deep learning framework for predicting liver metastases in postoperative colorectal cancer patients using natural language processing and clinical data integration

Cancer Med. 2023 Sep;12(18):19337-19351. doi: 10.1002/cam4.6523. Epub 2023 Sep 11.

Authors

Jia Li¹, Xinghao Wang¹, Linkun Cai^{1

2}, Jing Sun¹, Zhenghan Yang¹, Wenjuan Liu^{1

3}, Zhenchang Wang^{1

2}, Han Lv¹

Affiliations

¹ Department of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, People's Republic of China.
² School of Biological Science and Medical Engineering, Beihang University, Beijing, People's Republic of China.
³ Department of Radiology, Aerospace Center Hospital, Beijing, People's Republic of China.

Abstract

Background: The significance of liver metastasis (LM) in increasing the risk of death for postoperative colorectal cancer (CRC) patients necessitates innovative approaches to predict LM.

Aim: Our study presents a novel and significant contribution by developing an interpretable fusion model that effectively integrates both free-text medical record data and structured laboratory data to predict LM in postoperative CRC patients.

Methods: We used a robust dataset of 1463 patients and leveraged state-of-the-art natural language processing (NLP) and machine learning techniques to construct a two-layer fusion framework that demonstrates superior predictive performance compared to single modal models. Our innovative two-tier algorithm fuses the results from different data modalities, achieving balanced prediction results on test data and significantly enhancing the predictive ability of the model. To increase interpretability, we employed Shapley additive explanations to elucidate the contributions of free-text clinical data and structured clinical data to the final model. Furthermore, we translated our findings into practical clinical applications by creating a novel NLP score-based nomogram using the top 13 valid predictors identified in our study.

Results: The proposed fusion models demonstrated superior predictive performance with an accuracy of 80.8%, precision of 80.3%, recall of 80.5%, and an F1 score of 80.8% in predicting LMs.

Conclusion: This fusion model represents a notable advancement in predicting LMs for postoperative CRC patients, offering the potential to enhance patient outcomes and support clinical decision-making.

Keywords: artificial intelligence; bidirectional encoding representation of transformer; electronic health records; interpretable deep learning; natural language processing.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Colorectal Neoplasms* / surgery
Deep Learning*
Electronic Health Records
Humans
Natural Language Processing