Comparison of LASSO and random forest models for predicting the risk of premature coronary artery disease

BMC Med Inform Decis Mak. 2023 Dec 20;23(1):297. doi: 10.1186/s12911-023-02407-w.


Purpose: With the change of lifestyle, the occurrence of coronary artery disease presents a younger trend, increasing the medical and economic burden on the family and society. To reduce the burden caused by this disease, this study applied LASSO Logistic Regression and Random Forest to establish a risk prediction model for premature coronary artery disease(PCAD) separately and compared the predictive performance of the two models.

Methods: The data are obtained from 1004 patients with coronary artery disease admitted to a third-class hospital in Liaoning Province from September 2019 to December 2021. The data from 797 patients were ultimately evaluated. The dataset of 797 patients was randomly divided into the training set (569 persons) and the validation set (228 persons) scale by 7:3. The risk prediction model was established and compared by LASSO Logistic and Random Forest.

Result: The two models in this study showed that hyperuricemia, chronic renal disease, carotid artery atherosclerosis were important predictors of premature coronary artery disease. A result of the AUC between the two models showed statistical difference (Z = 3.47, P < 0.05).

Conclusions: Random Forest has better prediction performance for PCAD and is suitable for clinical practice. It can provide an objective reference for the early screening and diagnosis of premature coronary artery disease, guide clinical decision-making and promote disease prevention.

Keywords: Lasso; Premature coronary artery disease; Random forest; Risk prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Clinical Decision-Making
  • Coronary Artery Disease* / diagnosis
  • Coronary Artery Disease* / epidemiology
  • Humans
  • Logistic Models
  • Random Forest
  • Risk Factors