Background: Since screening of α-thalassemia carriers by low HbA2 has a low positive predictive value (PPV), the PPV was as low as 40.97% in our laboratory, other more effective screening methods need to be devised. This study aimed at developing a machine learning model by using red blood cell parameters to identify α-thalassemia carriers from low HbA2 patients.
Methods: Laboratory data of 1213 patients with low HbA2 used for modeling was randomly divided into the training set (849 of 1213, 70%) and the internal validation set (364 of 1213, 30%). In addition, an external data set (n = 399) was used for model validation. Fourteen machine learning methods were applied to construct a discriminant model. Performance was evaluated with accuracy, sensitivity, specificity, etc. and compared with 7 previously published discriminant function formulae.
Results: The optimal model was based on random forest with 5 clinical features. The PPV of the model was more than twice the PPV of HbA2, and the model had a high negative predictive value (NPV) at the same time. Compared with seven formulae in screening of α-thalassemia carriers, the model had a better accuracy (0.915), specificity (0.967), NPV (0.901), PPV (0.942) and area under the receiver operating characteristic curve (AUC, 0.948) in the independent test set.
Conclusion: Use of a random forest-based model enables rapid discrimination of α-thalassemia carriers from low HbA2 cases.
Keywords: Discriminant model; Low HbA(2) cases; Machine learning; Red blood cell parameters; α-thalassemia carrier.
Copyright © 2021 Elsevier B.V. All rights reserved.