Salivary metabolomics with machine learning for colorectal cancer detection

Cancer Sci. 2022 Sep;113(9):3234-3243. doi: 10.1111/cas.15472. Epub 2022 Jul 8.


As the worldwide prevalence of colorectal cancer (CRC) increases, it is vital to reduce its morbidity and mortality through early detection. Saliva-based tests are an ideal noninvasive tool for CRC detection. Here, we explored and validated salivary biomarkers to distinguish patients with CRC from those with adenoma (AD) and healthy controls (HC). Saliva samples were collected from patients with CRC, AD, and HC. Untargeted salivary hydrophilic metabolite profiling was conducted using capillary electrophoresis-mass spectrometry and liquid chromatography-mass spectrometry. An alternative decision tree (ADTree)-based machine learning (ML) method was used to assess the discrimination abilities of the quantified metabolites. A total of 2602 unstimulated saliva samples were collected from subjects with CRC (n = 235), AD (n = 50), and HC (n = 2317). Data were randomly divided into training (n = 1301) and validation datasets (n = 1301). The clustering analysis showed a clear consistency of aberrant metabolites between the two groups. The ADTree model was optimized through cross-validation (CV) using the training dataset, and the developed model was validated using the validation dataset. The model discriminating CRC + AD from HC showed area under the receiver-operating characteristic curves (AUC) of 0.860 (95% confidence interval [CI]: 0.828-0.891) for CV and 0.870 (95% CI: 0.837-0.903) for the validation dataset. The other model discriminating CRC from AD + HC showed an AUC of 0.879 (95% CI: 0.851-0.907) and 0.870 (95% CI: 0.838-0.902), respectively. Salivary metabolomics combined with ML demonstrated high accuracy and versatility in detecting CRC.

Keywords: biomarker; colorectal cancer; metabolomics; polyamine; saliva.

MeSH terms

  • Adenoma* / diagnosis
  • Adenoma* / metabolism
  • Biomarkers, Tumor / metabolism
  • Chromatography, Liquid
  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / metabolism
  • Humans
  • Machine Learning
  • Metabolomics / methods


  • Biomarkers, Tumor