Machine Learning Algorithms to Predict Colistin-Induced Nephrotoxicity from Electronic Health Records in Patients with Multidrug-Resistant Gram-Negative Infection

Ling-Wan Chiu; Yi-En Ku; Horng-Jiun Chao; Wen-Nung Lie; Fan Ying Chan; San-Yuan Wang; Wan-Chen Shen; Hsiang-Yin Chen

doi:10.1016/j.ijantimicag.2024.107175

Machine Learning Algorithms to Predict Colistin-Induced Nephrotoxicity from Electronic Health Records in Patients with Multidrug-Resistant Gram-Negative Infection

Int J Antimicrob Agents. 2024 Apr 18:107175. doi: 10.1016/j.ijantimicag.2024.107175. Online ahead of print.

Authors

Ling-Wan Chiu¹, Yi-En Ku², Horng-Jiun Chao², Wen-Nung Lie³, Fan Ying Chan², San-Yuan Wang⁴, Wan-Chen Shen¹, Hsiang-Yin Chen⁵

Affiliations

¹ Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan; Department of Pharmacy, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan.
² Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.
³ Department of Electrical Engineering, National Chung Cheng University, Chiayi, Taiwan.
⁴ Pharmacogenomics and Pharmacoproteomics, College of Pharmacy, Taipei Medical University, Taipei, Taiwan.
⁵ Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan; Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan. Electronic address: shawn@tmu.edu.tw.

PMID: 38642812
DOI: 10.1016/j.ijantimicag.2024.107175

Abstract

Objectives: Colistin-induced nephrotoxicity prolongs hospitalization and increases mortality. The study aimed to construct machine learning models to predict colistin-induced nephrotoxicity in patients with multidrug-resistant gram-negative infection.

Methods: Patients receiving colistin from three hospitals in the Clinical Research Database were included. Data were divided into a derivation cohort (2011∼2017) and a temporal validation cohort (2018∼2020). Fifteen machine learning models were established by categorical boosting, light gradient boosting machine, and random forest. Classifier performances were compared by the sensitivity, F1 score, Matthews correlation coefficient (MCC), area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve (AUPRC). SHapley Additive exPlanations plots were drawn to understand feature importance and interactions.

Results: The study included 1392 patients, with 360 (36.4%) and 165 (40.9%) experiencing nephrotoxicity in the derivation and temporal validation cohorts, respectively. The categorical boosting with oversampling achieved the highest performance with a sensitivity of 0.860, an F1 score of 0.740, an MCC of 0.533, an AUROC curve of 0.823, and an AUPRC of 0.737. The feature importance demonstrated that the days of colistin use, cumulative dose, daily dose, latest C-reactive protein, and baseline hemoglobin were the most important risk factors, especially for vulnerable patients. A cutoff colistin dose of 4.0 mg/kg body weight/day was identified for patients at higher risk of nephrotoxicity.

Conclusions: Machine learning techniques can be an early identification tool to predict colistin-induced nephrotoxicity. The observed interactions suggest a modification in dose adjustment guidelines. Future geographic and prospective validation studies are warranted to strengthen the real-world applicability.

Keywords: Catboost; colistin; machine learning; nephrotoxicity; resampling; support vector machine.