Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance

Int J Environ Res Public Health. 2020 Oct 28;17(21):7923. doi: 10.3390/ijerph17217923.


Cardiovascular diseases are the main cause of death worldwide. The aim of the present study is to verify the performances of a data mining methodology in the evaluation of cardiovascular risk in athletes, and whether the results may be used to support clinical decision making. Anthropometric (height and weight), demographic (age and sex) and biomedical (blood pressure and pulse rate) data of 26,002 athletes were collected in 2012 during routine sport medical examinations, which included electrocardiography at rest. Subjects were involved in competitive sport practice, for which medical clearance was needed. Outcomes were negative for the largest majority, as expected in an active population. Resampling was applied to balance positive/negative class ratio. A decision tree and logistic regression were used to classify individuals as either at risk or not. The receiver operating characteristic curve was used to assess classification performances. Data mining and resampling improved cardiovascular risk assessment in terms of increased area under the curve. The proposed methodology can be effectively applied to biomedical data in order to optimize clinical decision making, and-at the same time-minimize the amount of unnecessary examinations.

Keywords: decision tree; logistic regression; machine learning; medical diagnostic.

MeSH terms

  • Athletes*
  • Cardiovascular Diseases* / diagnosis
  • Cardiovascular Diseases* / epidemiology
  • Female
  • Heart Disease Risk Factors*
  • Humans
  • Male
  • ROC Curve