Background: This study aimed at predicting the survival status on non-small cell lung cancer patients with the phenotypic radiomics features obtained from the CT images.
Methods: A total of 186 patients' CT images were used for feature extraction via Pyradiomics. The minority group was balanced via SMOTE method. The final dataset was randomized into training set (n = 223) and validation set (n = 75) with the ratio of 3:1. Multiple random forest models were trained applying hyperparameters grid search with 10-fold cross-validation using precision or recall as evaluation standard. Then a decision threshold was searched on the selected model. The final model was evaluated through ROC curve and prediction accuracy.
Results: From those segmented images of 186 patients, 1218 features were obtained via feature extraction. The preferred model was selected with recall as evaluation standard and the optimal decision threshold was set 0.56. The model had a prediction accuracy of 89.33% and the AUC score was 0.9296.
Conclusion: A hyperparameters tuning random forest classifier had greater performance in predicting the survival status of non-small cell lung cancer patients, which could be taken for an automated classifier promising to stratify patients.
Keywords: CT; Non-small cell lung cancer; Radiomics; Random forest; Survival status.