Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors

Public Health Chall. 2025 Sep 29;4(4):e70135. doi: 10.1002/puh2.70135. eCollection 2025 Dec.

Abstract

Background: Childhood anaemia is a major public health challenge in Nigeria, with high prevalence among children under five. This study identifies key determinants and develops a predictive model using advanced machine learning technique.

Methods: A total of 13,136 children aged 6-59 months from the 2018 National Demographic and Health Survey (NDHS) were analysed. Sixteen machine learning algorithms were evaluated on the basis of their ability to predict childhood anaemia using a wide range of individual, community and environmental factors. The Extra Trees (ET) classifier, demonstrating the highest predictive performance, was used to identify the top 10 predictors of childhood anaemia. A fairness and demographic bias assessment framework was incorporated to evaluate the model's performance across different regions, wealth index categories, ethnic groups and gender.

Results: The ET classifier achieved an area under the curve (AUC) of 0.8319, an accuracy of 0.7565 and a recall of 0.7565. The top 10 predictors identified by the model included the number of under-five children in the household, birth order, child age, media access, maternal health-seeking behaviour, child gender, proximity to water, money problems, day land surface temperature and all population count. The demographic bias assessment revealed variations in model performance across different subgroups, with the lowest AUCs observed in the north-east region (0.79), the poorest wealth index category (0.80) and the Hausa/Fulani ethnic group (0.81).

Conclusion: This study shows that machine learning can accurately predict childhood anaemia in Nigeria and identify key risk factors, supporting targeted interventions. Future work should focus on refining models and integrating AI-based interventions to reduce anaemia.

Keywords: Nigeria; childhood anaemia; demographic bias; machine learning; predictive modelling; risk factors.