The Necessity of Multiple Data Sources for ECG-Based Machine Learning Models

Lucas Plagwitz; Tobias Vogelsang; Florian Doldi; Lucas Bickmann; Michael Fujarski; Lars Eckardt; Julian Varghese

doi:10.3233/SHTI230059

The Necessity of Multiple Data Sources for ECG-Based Machine Learning Models

Stud Health Technol Inform. 2023 May 18:302:33-37. doi: 10.3233/SHTI230059.

Authors

Lucas Plagwitz¹, Tobias Vogelsang¹, Florian Doldi², Lucas Bickmann¹, Michael Fujarski¹, Lars Eckardt², Julian Varghese¹

Affiliations

¹ Institute of Medical Informatics, University of Münster, Germany.
² Department for Cardiology II-Electrophysiology, University Hospital Münster, Germany.

PMID: 37203604
DOI: 10.3233/SHTI230059

Abstract

Even though the interest in machine learning studies is growing significantly, especially in medicine, the imbalance between study results and clinical relevance is more pronounced than ever. The reasons for this include data quality and interoperability issues. Hence, we aimed at examining site- and study-specific differences in publicly available standard electrocardiogram (ECG) datasets, which in theory should be interoperable by consistent 12-lead definition, sampling rate, and measurement duration. The focus lies upon the question of whether even slight study peculiarities can affect the stability of trained machine learning models. To this end, the performances of modern network architectures as well as unsupervised pattern detection algorithms are investigated across different datasets. Overall, this is intended to examine the generalization of machine learning results of single-site ECG studies.

Keywords: ECG; data integration; external validation; machine learning.

MeSH terms

Algorithms
Data Accuracy
Electrocardiography
Information Sources*
Machine Learning*