Screening of COVID-19 based on the extracted radiomics features from chest CT images

J Xray Sci Technol. 2021;29(2):229-243. doi: 10.3233/XST-200831.


Background and objective: Radiomics has been widely used in quantitative analysis of medical images for disease diagnosis and prognosis assessment. The objective of this study is to test a machine-learning (ML) method based on radiomics features extracted from chest CT images for screening COVID-19 cases.

Methods: The study is carried out on two groups of patients, including 138 patients with confirmed and 140 patients with suspected COVID-19. We focus on distinguishing pneumonia caused by COVID-19 from the suspected cases by segmentation of whole lung volume and extraction of 86 radiomics features. Followed by feature extraction, nine feature-selection procedures are used to identify valuable features. Then, ten ML classifiers are applied to classify and predict COVID-19 cases. Each ML models is trained and tested using a ten-fold cross-validation method. The predictive performance of each ML model is evaluated using the area under the curve (AUC) and accuracy.

Results: The range of accuracy and AUC is from 0.32 (recursive feature elimination [RFE]+Multinomial Naive Bayes [MNB] classifier) to 0.984 (RFE+bagging [BAG], RFE+decision tree [DT] classifiers) and 0.27 (mutual information [MI]+MNB classifier) to 0.997 (RFE+k-nearest neighborhood [KNN] classifier), respectively. There is no direct correlation among the number of the selected features, accuracy, and AUC, however, with changes in the number of the selected features, the accuracy and AUC values will change. Feature selection procedure RFE+BAG classifier and RFE+DT classifier achieve the highest prediction accuracy (accuracy: 0.984), followed by MI+Gaussian Naive Bayes (GNB) and logistic regression (LGR)+DT classifiers (accuracy: 0.976). RFE+KNN classifier as a feature selection procedure achieve the highest AUC (AUC: 0.997), followed by RFE+BAG classifier (AUC: 0.991) and RFE+gradient boosting decision tree (GBDT) classifier (AUC: 0.99).

Conclusion: This study demonstrates that the ML model based on RFE+KNN classifier achieves the highest performance to differentiate patients with a confirmed infection caused by COVID-19 from the suspected cases.

Keywords: COVID-19; chest CT images; machine-learning; radiomics.

MeSH terms

  • COVID-19 / diagnostic imaging*
  • Humans
  • Lung / diagnostic imaging
  • Machine Learning
  • Predictive Value of Tests
  • ROC Curve
  • Reproducibility of Results
  • SARS-CoV-2
  • Tomography, X-Ray Computed / methods*