An online alpha-thalassemia carrier discrimination model based on random forest and red blood cell parameters for low HbA2 cases

Clin Chim Acta. 2022 Jan 15:525:1-5. doi: 10.1016/j.cca.2021.12.003. Epub 2021 Dec 6.

Abstract

Background: Since screening of α-thalassemia carriers by low HbA2 has a low positive predictive value (PPV), the PPV was as low as 40.97% in our laboratory, other more effective screening methods need to be devised. This study aimed at developing a machine learning model by using red blood cell parameters to identify α-thalassemia carriers from low HbA2 patients.

Methods: Laboratory data of 1213 patients with low HbA2 used for modeling was randomly divided into the training set (849 of 1213, 70%) and the internal validation set (364 of 1213, 30%). In addition, an external data set (n = 399) was used for model validation. Fourteen machine learning methods were applied to construct a discriminant model. Performance was evaluated with accuracy, sensitivity, specificity, etc. and compared with 7 previously published discriminant function formulae.

Results: The optimal model was based on random forest with 5 clinical features. The PPV of the model was more than twice the PPV of HbA2, and the model had a high negative predictive value (NPV) at the same time. Compared with seven formulae in screening of α-thalassemia carriers, the model had a better accuracy (0.915), specificity (0.967), NPV (0.901), PPV (0.942) and area under the receiver operating characteristic curve (AUC, 0.948) in the independent test set.

Conclusion: Use of a random forest-based model enables rapid discrimination of α-thalassemia carriers from low HbA2 cases.

Keywords: Discriminant model; Low HbA(2) cases; Machine learning; Red blood cell parameters; α-thalassemia carrier.

MeSH terms

  • Erythrocytes / chemistry
  • Hemoglobin A2 / analysis
  • Humans
  • Mass Screening
  • alpha-Thalassemia* / diagnosis
  • alpha-Thalassemia* / genetics
  • beta-Thalassemia*

Substances

  • Hemoglobin A2