Development of gene model combined with machine learning technology to predict for advanced atherosclerotic plaques

Clin Neurol Neurosurg. 2023 Aug:231:107819. doi: 10.1016/j.clineuro.2023.107819. Epub 2023 Jun 10.

Abstract

Background: Atherosclerosis, as a major cause of stroke, is responsible for a quarter of deaths worldwide. In particular, rupture of late-stage plaques in large vessels such as the carotid artery can lead to serious cardiovascular disease. The aim of our study was to establish a genetic model combined with machining leaning techniques to screen out gene signatures and predict for advanced atherosclerosis plaques.

Methods: The microarray dataset GSE28829 and GSE43292 which were publicly obtained from the Gene Expression Omnibus database were utilized to screen for potential predictive genes. Differentially expressed genes (DEGs) were identified by using the "limma" R package. Gene Ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) analyses of these DEGs were performed by Metascape. Later, Random Forest (RF) algorithm was applied to further screen out top-30 genes which contribute the most. The expression data of top 30-DEGs were converted into a "Gene Score". Finally, we developed a model based on artificial neural network (ANN) to predict advanced atherosclerotic plaques. The model later was validated in an independent test dataset GSE104140.

Results: A total of 176 DEGs were identified in the training datasets. GO and KEGG enrichment analysis revealed that these genes were enriched in leukocyte-mediated immune response, cytokine- cytokine interactions, and immunoinflammatory signaling. Further, top-30 genes (including 25 upregulated and 5 downregulated DEGs) were screened as predictors by RF algorithm. The predictive model was developed with a significantly predictive value (AUC = 0.913) in the training datasets, and was validated with an independent dataset GSE104140 (AUC = 0.827).

Conclusion: In present study, our prediction model was established and showed satisfactory predictive power in both training and test datasets. In addition, this is the first study adopted bioinformatics methods combined with machine learning techniques (RF and ANN) to explore and predict for the advanced atherosclerotic plaques. However, further investigations were needed to verify the screened DEGs and predictive effectiveness of this model.

Keywords: Artificial neural network; Atherosclerotic plaques; Bioinformatics; Machine learning technology; Prediction model.

MeSH terms

  • Atherosclerosis*
  • Gene Expression Profiling / methods
  • Humans
  • Plaque, Atherosclerotic* / genetics
  • Plaque, Atherosclerotic* / metabolism
  • Signal Transduction
  • Transcriptome