Objectives: This study aims to develop cancer-associated venous thromboembolism (CA-VTE) risk prediction models using survival machine learning (ML) algorithms.
Methods: This study employed a double-cohort study design (retrospective and prospective). The retrospective cohort (n = 1036) was used as training set (70.0%, n = 725) and internal validation set (30.0%, n = 311); while the prospective cohort (n = 321) was used as external validation set. Seven survival ML algorithms, including COX regression, classification, regression and survival tree, random survival forest, gradient boosting survival machine tree, extreme gradient boosting survival tree, survival support vector analysis, and survival artificial neural network, were applied to train CA-VTE models.
Results: Univariate analysis and LASSO-COX regression both selected five predictors: age, previous VTE history, ICU/CCU, CCI, and D-dimer. The seven survival ML models (C-index: 0.709-0.760; Brier Score: 0.212-0.243) all outperformed Khorana Score (C-index: 0.632; Brier Score: 0.260) in external validation set. Among all models, the COX_DD model (COX regression + D-dimer) performed best. However, ML models and Khorana Score predicted CA-VTE risk on 7 days of hospitalization with an increase in Brier Score 0.25, showing poor calibration.
Conclusions: In this study, the CA-VTE risk prediction models developed in seven survival ML algorithms outperformed Khorana Score. Combining with D-dimer can improve model performance. Applying the nomogram based on the optimal COX_DD model allows oncology nurse to reassess CA-VTE risk once a week. The prediction models developed using survival ML algorithms in this study may contribute to the dynamic and accurate risk assessment of CA-VTE for cancer survivors.
Keywords: Decision making; Neoplasms; Risk stratification; Survival machine learning algorithm; Venous thromboembolism.
© 2025 The Author(s).