Harnessing Code Interpreters for Enhanced Predictive Modeling: A Case Study on High-Density Lipoprotein Level Estimation in Romanian Diabetic Patients

J Pers Med. 2023 Oct 6;13(10):1466. doi: 10.3390/jpm13101466.


Diabetes is a condition accompanied by the alteration of body parameters, including those related to lipids like triglyceride (TG), low-density lipoproteins (LDLs), and high-density lipoproteins (HDLs). The latter are grouped under the term dyslipidemia and are considered a risk factor for cardiovascular events. In the present work, we analyzed the complex relationships between twelve parameters (disease status, age, sex, body mass index, systolic blood pressure, diastolic blood pressure, TG, HDL, LDL, glucose, HbA1c levels, and disease onset) of patients with diabetes from Romania. An initial prospective analysis showed that HDL is inversely correlated with most of the parameters; therefore, we further analyzed the dependence of HDLs on the other factors. The analysis was conducted with the Code Interpreter plugin of ChatGPT, which was used to build several models from which Random Forest performed best. The principal predictors of HDLs were TG, LDL, and HbA1c levels. Random Forest models were used to model all parameters, showing that blood pressure and HbA1c can be predicted based on the other parameters with the least error, while the less predictable parameters were TG and LDL levels. By conducting the present study using the ChatGPT Code Interpreter, we show that elaborate analysis methods are at hand and easy to apply by researchers with limited computational resources. The insight that can be gained from such an approach, such as what we obtained on HDL level predictors in diabetes, could be relevant for deriving novel management strategies and therapeutic approaches.

Keywords: data analysis; data modeling; diabetes; management strategies; parameters; predictors.

Grants and funding

This research received no external funding.