The Necessity of Multiple Data Sources for ECG-Based Machine Learning Models

Stud Health Technol Inform. 2023 May 18:302:33-37. doi: 10.3233/SHTI230059.

Abstract

Even though the interest in machine learning studies is growing significantly, especially in medicine, the imbalance between study results and clinical relevance is more pronounced than ever. The reasons for this include data quality and interoperability issues. Hence, we aimed at examining site- and study-specific differences in publicly available standard electrocardiogram (ECG) datasets, which in theory should be interoperable by consistent 12-lead definition, sampling rate, and measurement duration. The focus lies upon the question of whether even slight study peculiarities can affect the stability of trained machine learning models. To this end, the performances of modern network architectures as well as unsupervised pattern detection algorithms are investigated across different datasets. Overall, this is intended to examine the generalization of machine learning results of single-site ECG studies.

Keywords: ECG; data integration; external validation; machine learning.

MeSH terms

  • Algorithms
  • Data Accuracy
  • Electrocardiography
  • Information Sources*
  • Machine Learning*