Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict progression of diabetic kidney disease

Diabetologia. 2021 Jul;64(7):1504-1515. doi: 10.1007/s00125-021-05444-0. Epub 2021 Apr 2.


Aim: Predicting progression in diabetic kidney disease (DKD) is critical to improving outcomes. We sought to develop/validate a machine-learned, prognostic risk score (KidneyIntelX™) combining electronic health records (EHR) and biomarkers.

Methods: This is an observational cohort study of patients with prevalent DKD/banked plasma from two EHR-linked biobanks. A random forest model was trained, and performance (AUC, positive and negative predictive values [PPV/NPV], and net reclassification index [NRI]) was compared with that of a clinical model and Kidney Disease: Improving Global Outcomes (KDIGO) categories for predicting a composite outcome of eGFR decline of ≥5 ml/min per year, ≥40% sustained decline, or kidney failure within 5 years.

Results: In 1146 patients, the median age was 63 years, 51% were female, the baseline eGFR was 54 ml min-1 [1.73 m]-2, the urine albumin to creatinine ratio (uACR) was 6.9 mg/mmol, follow-up was 4.3 years and 21% had the composite endpoint. On cross-validation in derivation (n = 686), KidneyIntelX had an AUC of 0.77 (95% CI 0.74, 0.79). In validation (n = 460), the AUC was 0.77 (95% CI 0.76, 0.79). By comparison, the AUC for the clinical model was 0.62 (95% CI 0.61, 0.63) in derivation and 0.61 (95% CI 0.60, 0.63) in validation. Using derivation cut-offs, KidneyIntelX stratified 46%, 37% and 17% of the validation cohort into low-, intermediate- and high-risk groups for the composite kidney endpoint, respectively. The PPV for progressive decline in kidney function in the high-risk group was 61% for KidneyIntelX vs 40% for the highest risk strata by KDIGO categorisation (p < 0.001). Only 10% of those scored as low risk by KidneyIntelX experienced progression (i.e., NPV of 90%). The NRIevent for the high-risk group was 41% (p < 0.05).

Conclusions: KidneyIntelX improved prediction of kidney outcomes over KDIGO and clinical models in individuals with early stages of DKD.

Keywords: Biomarkers; Diabetic kidney disease; Electronic data; Machine learning; Prediction.

Publication types

  • Observational Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Biomarkers / analysis*
  • Cohort Studies
  • Diabetic Nephropathies / diagnosis*
  • Diabetic Nephropathies / epidemiology
  • Diabetic Nephropathies / pathology
  • Disease Progression
  • Electronic Health Records* / statistics & numerical data
  • Female
  • Glomerular Filtration Rate
  • Humans
  • Kidney Function Tests / statistics & numerical data
  • Machine Learning*
  • Male
  • Middle Aged
  • Predictive Value of Tests
  • Prognosis
  • Risk Factors
  • United States / epidemiology
  • Young Adult


  • Biomarkers