Deep-learning model for prenatal congenital heart disease screening generalizes to community setting and outperforms clinical detection

C Athalye; A van Nisselrooij; S Rizvi; M C Haak; A J Moon-Grady; R Arnaout

doi:10.1002/uog.27503

Deep-learning model for prenatal congenital heart disease screening generalizes to community setting and outperforms clinical detection

Ultrasound Obstet Gynecol. 2024 Jan;63(1):44-52. doi: 10.1002/uog.27503.

Authors

C Athalye¹, A van Nisselrooij², S Rizvi¹, M C Haak², A J Moon-Grady³, R Arnaout^{1

3

4}

Affiliations

¹ Division of Cardiology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
² Department of Obstetrics, Division of Fetal Medicine, Leiden University Medical Center, Leiden, The Netherlands.
³ Department of Pediatrics, Division of Cardiology, University of California, San Francisco, San Francisco, CA, USA.
⁴ Bakar Computational Health Sciences Institute; Department of Radiology; UCSF Berkeley Joint Program in Computational Precision Health; Center for Intelligent Imaging; Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA, USA.

PMID: 37774040
PMCID: PMC10841849 (available on 2025-01-01)
DOI: 10.1002/uog.27503

Abstract

Objectives: Despite nearly universal prenatal ultrasound screening programs, congenital heart defects (CHD) are still missed, which may result in severe morbidity or even death. Deep machine learning (DL) can automate image recognition from ultrasound. The main aim of this study was to assess the performance of a previously developed DL model, trained on images from a tertiary center, using fetal ultrasound images obtained during the second-trimester standard anomaly scan in a low-risk population. A secondary aim was to compare initial screening diagnosis, which made use of live imaging at the point-of-care, with diagnosis by clinicians evaluating only stored images.

Methods: All pregnancies with isolated severe CHD in the Northwestern region of The Netherlands between 2015 and 2016 with available stored images were evaluated, as well as a sample of normal fetuses' examinations from the same region and time period. We compared the accuracy of the initial clinical diagnosis (made in real time with access to live imaging) with that of the model (which had only stored imaging available) and with the performance of three blinded human experts who had access only to the stored images (like the model). We analyzed performance according to ultrasound study characteristics, such as duration and quality (scored independently by investigators), number of stored images and availability of screening views.

Results: A total of 42 normal fetuses and 66 cases of isolated CHD at birth were analyzed. Of the abnormal cases, 31 were missed and 35 were detected at the time of the clinical anatomy scan (sensitivity, 53%). Model sensitivity and specificity were 91% and 78%, respectively. Blinded human experts (n = 3) achieved mean ± SD sensitivity and specificity of 55 ± 10% (range, 47-67%) and 71 ± 13% (range, 57-83%), respectively. There was a statistically significant difference in model correctness according to expert-graded image quality (P = 0.03). The abnormal cases included 19 lesions that the model had not encountered during its training; the model's performance in these cases (16/19 correct) was not statistically significantly different from that for previously encountered lesions (P = 0.41).

Conclusions: A previously trained DL algorithm had higher sensitivity than initial clinical assessment in detecting CHD in a cohort in which over 50% of CHD cases were initially missed clinically. Notably, the DL algorithm performed well on community-acquired images in a low-risk population, including lesions to which it had not been exposed previously. Furthermore, when both the model and blinded human experts had access to only stored images and not the full range of images available to a clinician during a live scan, the model outperformed the human experts. Together, these findings support the proposition that use of DL models can improve prenatal detection of CHD. © 2023 International Society of Ultrasound in Obstetrics and Gynecology.

Keywords: artificial intelligence; congenital heart disease; fetal screening; machine learning; ultrasound.

MeSH terms

Deep Learning*
Female
Heart Defects, Congenital* / diagnostic imaging
Heart Defects, Congenital* / epidemiology
Humans
Infant, Newborn
Pregnancy
Prenatal Diagnosis / methods
Sensitivity and Specificity
Ultrasonography, Prenatal / methods

Abstract

MeSH terms

Grants and funding