Background: Racial and ethnic minority groups and individuals facing social disadvantages, which often stem from their social determinants of health (SDoH), bear a disproportionate burden of type 2 diabetes (T2D) and its complications. It is crucial to implement effective social risk management strategies at the point of care.
Objective: To develop an electronic health records (EHR)-based machine learning (ML) analytical pipeline to address unmet social needs associated with hospitalization risk in patients with T2D.
Methods: We identified real-world patients with T2D from the EHR data from University of Florida (UF) Health Integrated Data Repository (IDR), incorporating both contextual SDoH (e.g., neighborhood deprivation) and individual-level SDoH (e.g., housing instability). The 2015-2020 data were used for training and validation and 2021-2022 data for independent testing. We developed a machine learning analytic pipeline, namely individualized polysocial risk score (iPsRS), to identify high social risk associated with hospitalizations in T2D patients, along with explainable AI (XAI) and fairness optimization.
Results: The study cohort included 10,192 real-world patients with T2D, with a mean age of 59 years and 58% female. Of the cohort, 50% were non-Hispanic White, 39% were non-Hispanic Black, 6% were Hispanic, and 5% were other races/ethnicities. Our iPsRS, including both contextual and individual-level SDoH as input factors, achieved a C statistic of 0.72 in predicting 1-year hospitalization after fairness optimization across racial and ethnic groups. The iPsRS showed excellent utility for capturing individuals at high hospitalization risk because of SDoH, that is, the actual 1-year hospitalization rate in the top 5% of iPsRS was 28.1%, ~13 times as high as the bottom decile (2.2% for 1-year hospitalization rate).
Conclusion: Our ML pipeline iPsRS can fairly and accurately screen for patients who have increased social risk leading to hospitalization in real word patients with T2D.
Keywords: Fairness; Machine Learning; Machine learning; Prediction; Type 2 diabetes.