Discovering the molecular differences between right- and left-sided colon cancer using machine learning methods

BMC Cancer. 2020 Oct 19;20(1):1012. doi: 10.1186/s12885-020-07507-8.

Abstract

Background: In recent years, the differences between left-sided colon cancer (LCC) and right-sided colon cancer (RCC) have received increasing attention due to the clinicopathological variation between them. However, some of these differences have remained unclear and conflicting results have been reported.

Methods: From The Cancer Genome Atlas (TCGA), we obtained RNA sequencing data and gene mutation data on 323 and 283 colon cancer patients, respectively. Differential analysis was firstly done on gene expression data and mutation data between LCC and RCC, separately. Machine learning (ML) methods were then used to select key genes or mutations as features to construct models to classify LCC and RCC patients. Finally, we conducted correlation analysis to identify the correlations between differentially expressed genes (DEGs) and mutations using logistic regression (LR) models.

Results: We found distinct gene mutation and expression patterns between LCC and RCC patients and further selected the 30 most important mutations and 17 most important gene expression features using ML methods. The classification models created using these features classified LCC and RCC patients with high accuracy (areas under the curve (AUC) of 0.8 and 0.96 for mutation and gene expression data, respectively). The expression of PRAC1 and BRAF V600E mutation (rs113488022) were the most important feature for each model. Correlations of mutations and gene expression data were also identified using LR models. Among them, rs113488022 was found to have significance relevance to the expression of four genes, and thus should be focused on in further study.

Conclusions: On the basis of ML methods, we found some key molecular differences between LCC and RCC, which could differentiate these two groups of patients with high accuracy. These differences might be key factors behind the variation in clinical features between LCC and RCC and thus help to improve treatment, such as determining the appropriate therapy for patients.

Keywords: Gene expression; Left-sided colon cancer; Machine learning; Mutations; Right-sided colon cancer.

Publication types

  • Comparative Study

MeSH terms

  • Biomarkers, Tumor
  • Colonic Neoplasms / genetics
  • Colonic Neoplasms / pathology*
  • Computational Biology / methods*
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Logistic Models
  • Machine Learning
  • Male
  • Mutation*
  • Nuclear Proteins / genetics*
  • Prognosis
  • Proto-Oncogene Proteins B-raf / genetics*
  • Retrospective Studies
  • Sequence Analysis, RNA

Substances

  • Biomarkers, Tumor
  • Nuclear Proteins
  • PRAC1 protein, human
  • BRAF protein, human
  • Proto-Oncogene Proteins B-raf