Introduction: Alzheimer's disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Early diagnosis is vital. We developed an interpretable machine learning (ML) model for early AD prediction using open clinical data.
Methods: Data from 2149 adults (60-90 years) were obtained from Kaggle. After preprocessing and feature engineering, tree-based models were trained. A stacking ensemble model combining Gradient Boosting and XGBoost was trained, with Logistic Regression as the meta-learner. SHapley Additive exPlanations (SHAP) provided interpretability. Performance was measured by accuracy, precision, recall, F1 score, ROC and AUC.
Results: The stacked ensemble achieved 97% accuracy (AUC 0.97), with 0.97 precision, 0.94 recall, and 0.96 F1 score for AD. SHAP identified memory complaints, Mini-Mental State Examination (MMSE), functional assessment, behavioral symptoms, cholesterol, and lifestyle factors (activity, diet, sleep) as top predictors.
Conclusion: The ensemble model, enhanced by SHAP analysis, provides accurate and interpretable AD risk predictions with potential applicability in future clinical decision support systems.
Highlights: Developed an ensemble machine learning (ML) model for early Alzheimer's disease (AD) prediction.Achieved 97% accuracy using stacked XGBoost and Gradient Boosting.SHapley Additive exPlanations (SHAP) analysis identified key cognitive and lifestyle-related risk factors.Model interprets AD risk using explainable artificial intelligence (AI) for clinical applicability.Utilized open-access dataset to ensure reproducibility and transparency.
Keywords: Alzheimer's disease; SHAP; cognitive impairment; ensemble model; explainable AI; machine learning.
© 2025 The Author(s). Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring published by Wiley Periodicals, LLC on behalf of Alzheimer's Association.