Dementia risk predictions from German claims data using methods of machine learning

Alzheimers Dement. 2023 Feb;19(2):477-486. doi: 10.1002/alz.12663. Epub 2022 Apr 22.

Abstract

Introduction: We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are.

Methods: We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs).

Results: Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors.

Discussion: Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.

Keywords: Germany; calibration; dementia; discrimination; health claims data; machine learning; risk factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Humans
  • Insurance, Health*
  • Logistic Models
  • Machine Learning*
  • Random Forest