Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths

BMC Med Inform Decis Mak. 2022 Jul 25;22(1):196. doi: 10.1186/s12911-022-01943-1.

Abstract

Background: Heart failure is a clinical syndrome characterised by a reduced ability of the heart to pump blood. Patients with heart failure have a high mortality rate, and physicians need reliable prognostic predictions to make informed decisions about the appropriate application of devices, transplantation, medications, and palliative care. In this study, we demonstrate that combining symbolic regression with the Cox proportional hazards model improves the ability to predict death due to heart failure compared to using the Cox proportional hazards model alone.

Methods: We used a newly invented symbolic regression method called the QLattice to analyse a data set of medical records for 299 Pakistani patients diagnosed with heart failure. The QLattice identified non-linear mathematical transformations of the available covariates, which we then used in a Cox model to predict survival.

Results: An exponential function of age, the inverse of ejection fraction, and the inverse of serum creatinine were identified as the best risk factors for predicting heart failure deaths. A Cox model fitted on these transformed covariates had improved predictive performance compared with a Cox model on the same covariates without mathematical transformations.

Conclusion: Symbolic regression is a way to find transformations of covariates from patients' medical records which can improve the performance of survival regression models. At the same time, these simple functions are intuitive and easy to apply in clinical settings. The direct interpretability of the simple forms may help researchers gain new insights into the actual causal pathways leading to deaths.

Keywords: Cardiovascular heart diseases; Heart failure; Machine learning; Proportional hazards model; Qlattice; Symbolic regression.

MeSH terms

  • Heart Failure*
  • Humans
  • Proportional Hazards Models
  • Regression Analysis
  • Risk Factors
  • Stroke Volume