An online alpha-thalassemia carrier discrimination model based on random forest and red blood cell parameters for low HbA2 cases

Pinning Feng; Yuzhe Li; Zhihao Liao; Zhenrong Yao; Wenbin Lin; Shuhua Xie; Beini Hu; Chencui Huang; Wei Liu; Hongxu Xu; Min Liu; Wenjia Gan

doi:10.1016/j.cca.2021.12.003

An online alpha-thalassemia carrier discrimination model based on random forest and red blood cell parameters for low HbA₂ cases

Clin Chim Acta. 2022 Jan 15:525:1-5. doi: 10.1016/j.cca.2021.12.003. Epub 2021 Dec 6.

Authors

Pinning Feng¹, Yuzhe Li², Zhihao Liao², Zhenrong Yao¹, Wenbin Lin¹, Shuhua Xie¹, Beini Hu³, Chencui Huang³, Wei Liu³, Hongxu Xu¹, Min Liu⁴, Wenjia Gan⁵

Affiliations

¹ Department of Clinical Laboratory, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
² Department of Clinical Laboratory, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
³ R&D Center, Beijing Deepwise & League of PHD Technology Co., Ltd, Beijing, China.
⁴ Department of Clinical Laboratory, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China. Electronic address: liumin@mail.sysu.edu.cn.
⁵ Department of Clinical Laboratory, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China. Electronic address: ganwj3@mail.sysu.edu.cn.

PMID: 34883090
DOI: 10.1016/j.cca.2021.12.003

Abstract

Background: Since screening of α-thalassemia carriers by low HbA₂ has a low positive predictive value (PPV), the PPV was as low as 40.97% in our laboratory, other more effective screening methods need to be devised. This study aimed at developing a machine learning model by using red blood cell parameters to identify α-thalassemia carriers from low HbA₂ patients.

Methods: Laboratory data of 1213 patients with low HbA₂ used for modeling was randomly divided into the training set (849 of 1213, 70%) and the internal validation set (364 of 1213, 30%). In addition, an external data set (n = 399) was used for model validation. Fourteen machine learning methods were applied to construct a discriminant model. Performance was evaluated with accuracy, sensitivity, specificity, etc. and compared with 7 previously published discriminant function formulae.

Results: The optimal model was based on random forest with 5 clinical features. The PPV of the model was more than twice the PPV of HbA₂, and the model had a high negative predictive value (NPV) at the same time. Compared with seven formulae in screening of α-thalassemia carriers, the model had a better accuracy (0.915), specificity (0.967), NPV (0.901), PPV (0.942) and area under the receiver operating characteristic curve (AUC, 0.948) in the independent test set.

Conclusion: Use of a random forest-based model enables rapid discrimination of α-thalassemia carriers from low HbA₂ cases.

Keywords: Discriminant model; Low HbA(2) cases; Machine learning; Red blood cell parameters; α-thalassemia carrier.

MeSH terms

Erythrocytes / chemistry
Hemoglobin A2 / analysis
Humans
Mass Screening
alpha-Thalassemia* / diagnosis
alpha-Thalassemia* / genetics
beta-Thalassemia*

Substances

Hemoglobin A2