Explainable Machine Learning for Atrial Fibrillation in the General Population Using a Generalized Additive Model - A Cross-Sectional Study

Circ Rep. 2021 Dec 28;4(2):73-82. doi: 10.1253/circrep.CR-21-0151. eCollection 2022 Feb 10.

Abstract

Background: Atrial fibrillation (AF) is the most common arrhythmia and is associated with increased thromboembolic stroke risk and heart failure. Although various prediction models for AF risk have been developed using machine learning, their output cannot be accurately explained to doctors and patients. Therefore, we developed an explainable model with high interpretability and accuracy accounting for the non-linear effects of clinical characteristics on AF incidence. Methods and Results: Of the 489,073 residents who underwent specific health checkups between 2009 and 2018 and were registered in the Kanazawa Medical Association database, data were used for 5,378 subjects with AF and 167,950 subjects with normal electrocardiogram readings. Forty-seven clinical parameters were combined using a generalized additive model algorithm. We validated the model and found that the area under the curve, sensitivity, and specificity were 0.964, 0.879, and 0.920, respectively. The 9 most important variables were the physical examination of arrhythmia, a medical history of coronary artery disease, age, hematocrit, γ-glutamyl transpeptidase, creatinine, hemoglobin, systolic blood pressure, and HbA1c. Further, non-linear relationships of clinical variables to the probability of AF diagnosis were visualized. Conclusions: We established a novel AF risk explanation model with high interpretability and accuracy accounting for non-linear information obtained at general health checkups. This model contributes not only to more accurate AF risk prediction, but also to a greater understanding of the effects of each characteristic.

Keywords: Atrial fibrillation; General population; Generalized additive model; Machine learning; Prediction.