Identifying the tumor location-associated candidate genes in development of new drugs for colorectal cancer using machine-learning-based approach

Med Biol Eng Comput. 2022 Oct;60(10):2877-2897. doi: 10.1007/s11517-022-02641-w. Epub 2022 Aug 10.

Abstract

Numerous studies have been conducted to elucidate the relation of tumor proximity to cancer prognosis and treatment efficacy in colorectal cancer. However, the molecular pathways and prognoses of left- and right-sided colorectal cancers are different, and this difference has not been fully investigated at the genomic level. In this study, a set of data science approaches, including six feature selection methods and three classification models, were used in predicting tumor location from gene expression profiles. Specificity, sensitivity, accuracy, and Mathew's correlation coefficient (MCC) evaluation metrics were used to evaluate the classification ability. Gene ontology enrichment analysis was applied by the Gene Ontology PANTHER Classification System. For the most significant 50 genes, protein-protein interactions and drug-gene interactions were analyzed using the GeneMANIA, CytoScape, CytoHubba, MCODE, and DGIdb databases. The highest classification accuracy (90%) is achieved with the most significant 200 genes when the ensemble-decision tree classification model is used with the ReliefF feature selection method. Molecular pathways and drug interactions are investigated for the most significant 50 genes. It is concluded that a machine-learning-based approach could be useful to discover the significant genes that may have an important role in the development of new therapies and drugs for colorectal cancer.

Keywords: Classification; Colorectal cancer; Druggable gene; Gene expression; Machine-learning; Tumor location.

MeSH terms

  • Colorectal Neoplasms* / drug therapy
  • Colorectal Neoplasms* / genetics
  • Gene Ontology
  • Humans
  • Machine Learning*