Identification of Novel COVID-19 Biomarkers by Multiple Feature Selection Strategies

Comput Math Methods Med. 2021 Sep 27:2021:2203636. doi: 10.1155/2021/2203636. eCollection 2021.

Abstract

Coronavirus disease 2019 (COVID-19) arising from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic since its first report in December 2019. So far, SARS-CoV-2 nucleic acid detection has been deemed as the golden standard of COVID-19 diagnosis. However, this detection method often leads to false negatives, thus triggering missed COVID-19 diagnosis. Therefore, it is urgent to find new biomarkers to increase the accuracy of COVID-19 diagnosis. To explore new biomarkers of COVID-19 in this study, expression profiles were firstly accessed from the GEO database. On this basis, 500 feature genes were screened by the minimum-redundancy maximum-relevancy (mRMR) feature selection method. Afterwards, the incremental feature selection (IFS) method was used to choose a classifier with the best performance from different feature gene-based support vector machine (SVM) classifiers. The corresponding 66 feature genes were set as the optimal feature genes. Lastly, the optimal feature genes were subjected to GO functional enrichment analysis, principal component analysis (PCA), and protein-protein interaction (PPI) network analysis. All in all, it was posited that the 66 feature genes could effectively classify positive and negative COVID-19 and work as new biomarkers of the disease.

MeSH terms

  • Algorithms
  • Biomarkers / metabolism*
  • COVID-19 / genetics*
  • COVID-19 / metabolism*
  • COVID-19 Testing
  • Computational Biology
  • False Negative Reactions
  • False Positive Reactions
  • Gene Expression Profiling
  • Humans
  • Machine Learning
  • Models, Statistical
  • Principal Component Analysis
  • Protein Interaction Mapping
  • Research Design
  • Sensitivity and Specificity

Substances

  • Biomarkers