Background: Globally, preterm birth is the leading cause of neonatal death with estimated prevalence and associated mortality highest in low- and middle-income countries (LMICs). Accurate identification of preterm infants is important at the individual level for appropriate clinical intervention as well as at the population level for informed policy decisions and resource allocation. As early prenatal ultrasound is commonly not available in these settings, gestational age (GA) is often estimated using newborn assessment at birth. This approach assumes last menstrual period to be unreliable and birthweight to be unable to distinguish preterm infants from those that are small for gestational age (SGA). We sought to leverage machine learning algorithms incorporating maternal factors associated with SGA to improve accuracy of preterm newborn identification in LMIC settings.
Methods and findings: This study uses data from an ongoing obstetrical cohort in Lusaka, Zambia that uses early pregnancy ultrasound to estimate GA. Our intent was to identify the best set of parameters commonly available at delivery to correctly categorize births as either preterm (<37 weeks) or term, compared to GA assigned by early ultrasound as the gold standard. Trained midwives conducted a newborn assessment (<72 hours) and collected maternal and neonatal data at the time of delivery or shortly thereafter. New Ballard Score (NBS), last menstrual period (LMP), and birth weight were used individually to assign GA at delivery and categorize each birth as either preterm or term. Additionally, machine learning techniques incorporated combinations of these measures with several maternal and newborn characteristics associated with prematurity and SGA to develop GA at delivery and preterm birth prediction models. The distribution and accuracy of all models were compared to early ultrasound dating. Within our live-born cohort to date (n = 862), the median GA at delivery by early ultrasound was 39.4 weeks (IQR: 38.3-40.3). Among assessed newborns with complete data included in this analysis (n = 468), the median GA by ultrasound was 39.6 weeks (IQR: 38.4-40.3). Using machine learning, we identified a combination of six accessible parameters (LMP, birth weight, twin delivery, maternal height, hypertension in labor, and HIV serostatus) that can be used by machine learning to outperform current GA prediction methods. For preterm birth prediction, this combination of covariates correctly classified >94% of newborns and achieved an area under the curve (AUC) of 0.9796.
Conclusions: We identified a parsimonious list of variables that can be used by machine learning approaches to improve accuracy of preterm newborn identification. Our best-performing model included LMP, birth weight, twin delivery, HIV serostatus, and maternal factors associated with SGA. These variables are all easily collected at delivery, reducing the skill and time required by the frontline health worker to assess GA.
Trial registration: ClinicalTrials.gov Identifier: NCT02738892.