Multi-marker discovery for mild cognitive impairment in metabolomics using machine learning with a global surrogate model via partial least squares

Metabolomics. 2025 Nov 15;21(6):164. doi: 10.1007/s11306-025-02372-7.

Abstract

Introduction: Dementia can be prevented through early intervention; hence, there is an urgent need for biomarkers to help diagnose mild cognitive impairment (MCI).

Objectives: We aimed to develop a multi-marker panel composed of plasma metabolites to aid in the diagnosis of MCI.

Methods: We performed an analysis of a multi-marker panel of MCI metabolites using a random forest algorithm with variable selection methods and a global surrogate with principal component analysis and partial least squares (PLS).

Results: By incorporating variable selection methods, we constructed a predictive model that demonstrated robust performance, with an AUC of approximately 0.85 in both cross-validation and test evaluations, using only five metabolites (methionine, quinic acid, hypoxanthine, O-acetylcarnitine, and 2-oxoglutaric acid). However, owing to the limited number of selected metabolites, it was challenging to infer the biological meaning of this multi-marker panel. To interpret this multi-marker panel biologically, we constructed a global surrogate model using PLS. By examining the PLS loadings corresponding to the scores with intergroup differences, we identified a relationship between 14 metabolites involved in neuronal energy metabolism and neurotransmission. This suggests that the multi-marker panel constructed in this study is related to abnormalities in energy metabolism and neurotransmission in patients with MCI.

Conclusion: The method used in this study may be broadly applicable for analyzing multi-marker panels of metabolites and their biological interpretation. This study included an independent validation, and further larger-scale studies using additional external cohorts are warranted to confirm the generalizability of this approach.

Keywords: Alzheimer’s disease with dementia; Biomarker; Cohort; Global surrogate model; Metabolome; Mild cognitive impairment; Partial least squares; Random forest model.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms
  • Biomarkers* / blood
  • Biomarkers* / metabolism
  • Cognitive Dysfunction* / blood
  • Cognitive Dysfunction* / diagnosis
  • Cognitive Dysfunction* / metabolism
  • Female
  • Humans
  • Least-Squares Analysis
  • Machine Learning*
  • Male
  • Metabolomics* / methods
  • Principal Component Analysis

Substances

  • Biomarkers