A machine learning approach to unmask novel gene signatures and prediction of Alzheimer's disease within different brain regions

Genomics. 2021 Jul;113(4):1778-1789. doi: 10.1016/j.ygeno.2021.04.028. Epub 2021 Apr 18.

Abstract

Alzheimer's disease (AD) is a progressive neurodegenerative disorder whose aetiology is currently unknown. Although numerous studies have attempted to identify the genetic risk factor(s) of AD, the interpretability and/or the prediction accuracies achieved by these studies remained unsatisfactory, reducing their clinical significance. Here, we employ the ensemble of random-forest and regularized regression model (LASSO) to the AD-associated microarray datasets from four brain regions - Prefrontal cortex, Middle temporal gyrus, Hippocampus, and Entorhinal cortex- to discover novel genetic biomarkers through a machine learning-based feature-selection classification scheme. The proposed scheme unraveled the most optimum and biologically significant classifiers within each brain region, which achieved by far the highest prediction accuracy of AD in 5-fold cross-validation (99% average). Interestingly, along with the novel and prominent biomarkers including CORO1C, SLC25A46, RAE1, ANKIB1, CRLF3, PDYN, numerous non-coding RNA genes were also observed as discriminator, of which AK057435 and BC037880 are uncharacterized long non-coding RNA genes.

Keywords: Alzheimer's disease; Biomarkers; Classification; Feature selection; Gene expression; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease* / genetics
  • Brain
  • Humans
  • Machine Learning
  • Mitochondrial Proteins
  • Nuclear Matrix-Associated Proteins
  • Nucleocytoplasmic Transport Proteins
  • Phosphate Transport Proteins

Substances

  • Mitochondrial Proteins
  • Nuclear Matrix-Associated Proteins
  • Nucleocytoplasmic Transport Proteins
  • Phosphate Transport Proteins
  • RAE1 protein, human
  • SLC25A46 protein, human