Identifying biomarkers associated with the diagnosis of ulcerative colitis via bioinformatics and machine learning

Math Biosci Eng. 2023 Apr 17;20(6):10741-10756. doi: 10.3934/mbe.2023476.

Abstract

Background: Ulcerative colitis (UC) is an idiopathic inflammatory disease with an increasing incidence. This study aimed to identify potential UC biomarkers and associated immune infiltration characteristics.

Methods: Two datasets (GSE87473 and GSE92415) were merged to obtain 193 UC samples and 42 normal samples. Using R, differentially expressed genes (DEGs) between UC and normal samples were filtered out, and their biological functions were investigated using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses. Promising biomarkers were identified using least absolute shrinkage selector operator regression and support vector machine recursive feature elimination, and their diagnostic efficacy was evaluated through receiver operating characteristic (ROC) curves. Finally, CIBERSORT was used to investigate the immune infiltration characteristics in UC, and the relationship between the identified biomarkers and various immune cells was examined.

Results: We found 102 DEGs, of which 64 were significantly upregulated, and 38 were significantly downregulated. The DEGs were enriched in pathways associated with interleukin-17, cytokine-cytokine receptor interaction and viral protein interactions with cytokines and cytokine receptors, among others. Using machine learning methods and ROC tests, we confirmed DUOX2, DMBT1, CYP2B7P, PITX2 and DEFB1 to be essential diagnostic genes for UC. Immune cell infiltration analysis revealed that all five diagnostic genes were correlated with regulatory T cells, CD8 T cells, activated and resting memory CD4 T cells, activated natural killer cells, neutrophils, activated and resting mast cells, activated and resting dendritic cells and M0, M1 and M2 macrophages.

Conclusions: DUOX2, DMBT1, CYP2B7P, PITX2 and DEFB1 were identified as prospective biomarkers for UC. A new perspective on understanding the progression of UC may be provided by these biomarkers and their relationship with immune cell infiltration.

Keywords: LASSO regression model; bioinformatics; differentially expressed genes; support vector machine recursive feature elimination; ulcerative colitis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers
  • Calcium-Binding Proteins
  • Colitis, Ulcerative* / diagnosis
  • Colitis, Ulcerative* / genetics
  • Computational Biology
  • Cytokines
  • DNA-Binding Proteins
  • Dual Oxidases
  • Humans
  • Machine Learning
  • Tumor Suppressor Proteins
  • beta-Defensins*

Substances

  • Dual Oxidases
  • Biomarkers
  • Cytokines
  • DEFB1 protein, human
  • beta-Defensins
  • DMBT1 protein, human
  • Calcium-Binding Proteins
  • DNA-Binding Proteins
  • Tumor Suppressor Proteins