Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis

Comput Biol Med. 2022 Jun:145:105409. doi: 10.1016/j.compbiomed.2022.105409. Epub 2022 Mar 19.


Advanced metastasis of colon cancer makes it more difficult to treat colon cancer. Finding the markers of colon cancer (Colon Cancer) can diagnose the stage of cancer in time and improve the prognosis with timely treatment. This paper uses gene expression profiling data from The Cancer Genome Atlas (TCGA) for the diagnosis of colon cancer and its staging. In this study, we first selected the gene modules with the greatest correlation with cancer by Weighted Gene Co-expression Network Analysis (WGCNA), extracted the characteristic genes for differential expression results using the least absolute shrinkage and selection operator algorithm (Lasso) and performed survival analysis, and then combined the genes in the modules with the Lasso-extracted feature genes were combined to diagnose colon cancer versus healthy controls using RF, SVM and decision trees, and colon cancer staging was diagnosed using differentially expressed genes for each stage. Finally, Protein-Protein Interaction Networks (PPI) networks were done for 289 genes to identify clusters of aggregated proteins for survival analysis. Finally, the RF model had the best results in the diagnosis of colon cancer versus control group fold cross-validation with an average accuracy of 99.81%, F1 value reaching 0.9968, accuracy of 99.88%, and recall of 99.5%, and an average accuracy of 91.5%, F1 value reaching 0.7679, accuracy of 86.94%, and recall in the diagnosis of colon cancer stages I, II, III and IV. The recall rate reached 73.04%, and eight genes associated with colon cancer prognosis were identified for GCNT2, GLDN, SULT1B1, UGT2B15, PTGDR2, GPR15, BMP5 and CPT2.

Keywords: Colon cancer; Machine learning; PPI; Prognosis; Staging; WGCNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / genetics
  • Colonic Neoplasms* / diagnosis
  • Colonic Neoplasms* / genetics
  • Computational Biology* / methods
  • Gene Expression Profiling / methods
  • Gene Regulatory Networks
  • Humans
  • Machine Learning
  • Receptors, G-Protein-Coupled / genetics
  • Receptors, Peptide / genetics


  • Biomarkers, Tumor
  • GPR15 protein, human
  • Receptors, G-Protein-Coupled
  • Receptors, Peptide