Development and validation of a machine-learning model for prediction of shoulder dystocia

Ultrasound Obstet Gynecol. 2020 Oct;56(4):588-596. doi: 10.1002/uog.21878.


Objectives: To develop a machine-learning (ML) model for prediction of shoulder dystocia (ShD) and to externally validate the model's predictive accuracy and potential clinical efficacy in optimizing the use of Cesarean delivery in the context of suspected macrosomia.

Methods: We used electronic health records (EHR) from the Sheba Medical Center in Israel to develop the model (derivation cohort) and EHR from the University of California San Francisco Medical Center to validate the model's accuracy and clinical efficacy (validation cohort). Subsequent to application of inclusion and exclusion criteria, the derivation cohort included 686 singleton vaginal deliveries, of which 131 were complicated by ShD, and the validation cohort included 2584 deliveries, of which 31 were complicated by ShD. For each of these deliveries, we collected maternal and neonatal delivery outcomes coupled with maternal demographics, obstetric clinical data and sonographic fetal biometry. Biometric measurements and their derived estimated fetal weight were adjusted (aEFW) according to gestational age at delivery. A ML pipeline was utilized to develop the model.

Results: In the derivation cohort, the ML model provided significantly better prediction than did the current clinical paradigm based on fetal weight and maternal diabetes: using nested cross-validation, the area under the receiver-operating-characteristics curve (AUC) of the model was 0.793 ± 0.041, outperforming aEFW combined with diabetes (AUC = 0.745 ± 0.044, P = 1e-16 ). The following risk modifiers had a positive beta that was > 0.02, i.e. they increased the risk of ShD: aEFW (beta = 0.164), pregestational diabetes (beta = 0.047), prior ShD (beta = 0.04), female fetal sex (beta = 0.04) and adjusted abdominal circumference (beta = 0.03). The following risk modifiers had a negative beta that was < -0.02, i.e. they were protective of ShD: adjusted biparietal diameter (beta = -0.08) and maternal height (beta = -0.03). In the validation cohort, the model outperformed aEFW combined with diabetes (AUC = 0.866 vs 0.784, P = 0.00007). Additionally, in the validation cohort, among the subgroup of 273 women carrying a fetus with aEFW ≥ 4000 g, the aEFW had no predictive power (AUC = 0.548), and the model performed significantly better (0.775, P = 0.0002). A risk-score threshold of 0.5 stratified 42.9% of deliveries to the high-risk group, which included 90.9% of ShD cases and all cases accompanied by maternal or newborn complications. A more specific threshold of 0.7 stratified only 27.5% of the deliveries to the high-risk group, which included 63.6% of ShD cases and all those accompanied by newborn complications.

Conclusion: We developed a ML model for prediction of ShD and, in a different cohort, externally validated its performance. The model predicted ShD better than did estimated fetal weight either alone or combined with maternal diabetes, and was able to stratify the risk of ShD and neonatal injury in the context of suspected macrosomia. Copyright © 2019 ISUOG. Published by John Wiley & Sons Ltd.

Keywords: EHR; anthropometry; artificial intelligence; biometry; macrosomia.