Explainable Machine Learning for Atrial Fibrillation in the General Population Using a Generalized Additive Model - A Cross-Sectional Study

Masaki Kawakami; Shigehiro Karashima; Kento Morita; Hayato Tada; Hirofumi Okada; Daisuke Aono; Mitsuhiro Kometani; Akihiro Nomura; Masashi Demura; Kenji Furukawa; Takashi Yoneda; Hidetaka Nambo; Masa-Aki Kawashiri

doi:10.1253/circrep.CR-21-0151

Explainable Machine Learning for Atrial Fibrillation in the General Population Using a Generalized Additive Model - A Cross-Sectional Study

Circ Rep. 2021 Dec 28;4(2):73-82. doi: 10.1253/circrep.CR-21-0151. eCollection 2022 Feb 10.

Authors

Affiliations

¹ School of Electrical Information Communication Engineering, College of Science and Engineering, Kanazawa University Kanazawa Japan.
² Institute of Liberal Arts and Science, Kanazawa University Kanazawa Japan.
³ Department of Cardiovascular Medicine, Graduate School of Medical Science, Kanazawa University Kanazawa Japan.
⁴ Department of Endocrinology and Metabolism, Graduate School of Medical Science, Kanazawa University Kanazawa Japan.
⁵ Departments of Hygiene, Graduate School of Medical Science, Kanazawa University Kanazawa Japan.
⁶ Health Care Center, Japan Advanced Institute of Science and Technology Nomi Japan.
⁷ Institute of Transdisciplinary Sciences, Kanazawa University Kanazawa Japan.
⁸ Department of Health Promotion and Medicine of the Future, Graduate School of Medical Science, Kanazawa University Kanazawa Japan.

^# Contributed equally.

Abstract

Background: Atrial fibrillation (AF) is the most common arrhythmia and is associated with increased thromboembolic stroke risk and heart failure. Although various prediction models for AF risk have been developed using machine learning, their output cannot be accurately explained to doctors and patients. Therefore, we developed an explainable model with high interpretability and accuracy accounting for the non-linear effects of clinical characteristics on AF incidence. Methods and Results: Of the 489,073 residents who underwent specific health checkups between 2009 and 2018 and were registered in the Kanazawa Medical Association database, data were used for 5,378 subjects with AF and 167,950 subjects with normal electrocardiogram readings. Forty-seven clinical parameters were combined using a generalized additive model algorithm. We validated the model and found that the area under the curve, sensitivity, and specificity were 0.964, 0.879, and 0.920, respectively. The 9 most important variables were the physical examination of arrhythmia, a medical history of coronary artery disease, age, hematocrit, γ-glutamyl transpeptidase, creatinine, hemoglobin, systolic blood pressure, and HbA1c. Further, non-linear relationships of clinical variables to the probability of AF diagnosis were visualized. Conclusions: We established a novel AF risk explanation model with high interpretability and accuracy accounting for non-linear information obtained at general health checkups. This model contributes not only to more accurate AF risk prediction, but also to a greater understanding of the effects of each characteristic.

Keywords: Atrial fibrillation; General population; Generalized additive model; Machine learning; Prediction.