In this paper, we use the CUR matrix factorization as a means of dimension reduction to identify important subsequences in electrocardiogram (ECG) time series. As opposed to other factorizations typically used in dimension reduction that characterize data in terms of abstract representatives (for example, an orthogonal basis), the CUR factorization describes the data in terms of actual instances within the original data set. Therefore, the CUR characterization can be directly related back to the clinical setting. We apply CUR to a synthetic ECG data set as well as to data from the MIT-BIH Arrhythmia, MGH-MF, and Incart databases using the discrete empirical interpolation method (DEIM) and an incremental QR factorization. In doing so, we demonstrate that CUR is able to identify beat morphologies that are representative of the data set, including rare-occurring beat events, providing a robust summarization of the ECG data. We also see that using CUR-selected beats to label the remaining unselected beats via 1-nearest neighbor classification produces results comparable to those presented in other works. While the electrocardiogram is of particular interest here, this work demonstrates the utility of CUR in detecting representative subsequences in quasiperiodic physiological time series.
Keywords: CUR matrix factorization; Dimension reduction; Electrocardiogram; Temporal data analysis.
Copyright © 2017 Elsevier Inc. All rights reserved.