Ranking and filtering of neuropathology features in the machine learning evaluation of dementia studies

Mohammed D Rajab; Teruka Taketa; Stephen B Wharton; Dennis Wang; Cognitive Function and Ageing Neuropathology Study, and for the Alzheimer's Disease Neuroimaging Initiative

doi:10.1111/bpa.13247

Ranking and filtering of neuropathology features in the machine learning evaluation of dementia studies

Brain Pathol. 2024 Feb 19:e13247. doi: 10.1111/bpa.13247. Online ahead of print.

Authors

Mohammed D Rajab^{1

2}, Teruka Taketa¹, Stephen B Wharton¹, Dennis Wang^{1

2

3

4

5}; Cognitive Function and Ageing Neuropathology Study, and for the Alzheimer's Disease Neuroimaging Initiative

Affiliations

¹ Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK.
² Department of Computer Science, University of Sheffield, Sheffield, UK.
³ Singapore Institute Clinical Sciences, Agency for Science Technology and Research (A*STAR), Singapore, Singapore.
⁴ Bioinformatics Institute, Agency for Science Technology and Research (A*STAR), Singapore, Singapore.
⁵ National Heart and Lung Institute, Imperial College London, London, UK.

PMID: 38374326
DOI: 10.1111/bpa.13247

Abstract

Early diagnosis of dementia diseases, such as Alzheimer's disease, is difficult because of the time and resources needed to perform neuropsychological and pathological assessments. Given the increasing use of machine learning methods to evaluate neuropathology features in the brains of dementia patients, it is important to investigate how the selection of features may be impacted and which features are most important for the classification of dementia. We objectively assessed neuropathology features using machine learning techniques for filtering features in two independent ageing cohorts, the Cognitive Function and Aging Studies (CFAS) and Alzheimer's Disease Neuroimaging Initiative (ADNI). The reliefF and least loss methods were most consistent with their rankings between ADNI and CFAS; however, reliefF was most biassed by feature-feature correlations. Braak stage was consistently the highest ranked feature and its ranking was not correlated with other features, highlighting its unique importance. Using a smaller set of highly ranked features, rather than all features, can achieve a similar or better dementia classification performance in CFAS (60%-70% accuracy with Naïve Bayes). This study showed that specific neuropathology features can be prioritised by feature filtering methods, but they are impacted by feature-feature correlations and their results can vary between cohort studies. By understanding these biases, we can reduce discrepancies in feature ranking and identify a minimal set of features needed for accurate classification of dementia.

Keywords: Alzheimer's disease; collinearity; dementia; feature selection; machine learning; neuropathology.

Abstract

Grants and funding