DeepThal: A Deep Learning-Based Framework for the Large-Scale Prediction of the α+-Thalassemia Trait Using Red Blood Cell Parameters

Krittaya Phirom; Phasit Charoenkwan; Watshara Shoombuatong; Pimlak Charoenkwan; Supatra Sirichotiyakul; Theera Tongsong

doi:10.3390/jcm11216305

DeepThal: A Deep Learning-Based Framework for the Large-Scale Prediction of the α⁺-Thalassemia Trait Using Red Blood Cell Parameters

J Clin Med. 2022 Oct 26;11(21):6305. doi: 10.3390/jcm11216305.

Authors

Krittaya Phirom¹, Phasit Charoenkwan², Watshara Shoombuatong³, Pimlak Charoenkwan^{4

5}, Supatra Sirichotiyakul^{1

5}, Theera Tongsong^{1

5}

Affiliations

¹ Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand.
² Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
³ Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
⁴ Department of Pediatrics, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand.
⁵ Thalassemia and Hematology Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand.

Abstract

Objectives: To develop a machine learning (ML)-based framework using red blood cell (RBC) parameters for the prediction of the α+-thalassemia trait (α+-thal trait) and to compare the diagnostic performance with a conventional method using a single RBC parameter or a combination of RBC parameters. Methods: A retrospective study was conducted on possible couples at risk for fetus with hemoglobin H (Hb H disease). Subjects with molecularly confirmed normal status (not thalassemia), α+-thal trait, and two-allele α-thalassemia mutation were included. Clinical parameters (age and gender) and RBC parameters (Hb, Hct, MCV, MCH, MCHC, RDW, and RBC count) obtained from their antenatal thalassemia screen were retrieved and analyzed using a machine learning (ML)-based framework and a conventional method. The performance of α+-thal trait prediction was evaluated. Results: In total, 594 cases (female/male: 330/264, mean age: 29.7 ± 6.6 years) were included in the analysis. There were 229 normal controls, 160 cases with the α+-thalassemia trait, and 205 cases in the two-allele α-thalassemia mutation category, respectively. The ML-derived model improved the diagnostic performance, giving a sensitivity of 80% and specificity of 81%. The experimental results indicated that DeepThal achieved a better performance compared with other ML-based methods in terms of the independent test dataset, with an accuracy of 80.77%, sensitivity of 70.59%, and the Matthews correlation coefficient (MCC) of 0.608. Of all the red blood cell parameters, MCH < 28.95 pg as a single parameter had the highest performance in predicting the α+-thal trait with the AUC of 0.857 and 95% CI of 0.816−0.899. The combination model derived from the binary logistic regression analysis exhibited improved performance with the AUC of 0.868 and 95% CI of 0.830−0.906, giving a sensitivity of 80.1% and specificity of 75.1%. Conclusions: The performance of DeepThal in terms of the independent test dataset is sufficient to demonstrate that DeepThal is capable of accurately predicting the α+-thal trait. It is anticipated that DeepThal will be a useful tool for the scientific community in the large-scale prediction of the α+-thal trait.

Keywords: alpha plus-thalassemia; hemoglobin H disease; machine learning; red blood cell indices; screening.

Abstract

Grants and funding