Development and validation of Age-Specific algorithms for diabetes prediction

Endocrine. 2025 Dec;90(3):1253-1262. doi: 10.1007/s12020-025-04428-z. Epub 2025 Sep 23.

Abstract

Diabetes mellitus (DM) has a higher incidence among older adults. This study aimed to develop age-specific DM prediction models using machine learning (ML) and anomaly detection algorithms. We included 489,073 participants from Kanazawa City and 31,923 from Hakui City who underwent health check-ups. Four models were constructed, comprising a Light Gradient Boosting Machine (LGBM), TabNet, Variational Autoencoder, and Isolation Forest (IF), to predict DM onset within three years. The models were trained using the Kanazawa dataset and externally validated using the Hakui dataset. Performance was evaluated based on the area under the curve (AUC), sensitivity, and specificity. The LGBM model demonstrated the highest AUC across multiple age groups in both internal and external validations. For participants in their 50s and 60s, the LGBM achieved AUC values of 0.911 during internal validation, with sensitivity and specificity exceeding those of the other models. In contrast, the IF model exhibited the best performance for participants in their 40s. The findings of this study suggest the potential effectiveness of age-specific models in improving diabetes prediction accuracy within the study population. Further validation using more diverse populations and younger age groups are recommended for future research.

Supplementary Information: The online version contains supplementary material available at 10.1007/s12020-025-04428-z.

Keywords: Age-specific models; Anomaly detection; Light gradient boosting machine; Machine learning.