Deep-learning model for prenatal congenital heart disease screening generalizes to community setting and outperforms clinical detection

Ultrasound Obstet Gynecol. 2024 Jan;63(1):44-52. doi: 10.1002/uog.27503.

Abstract

Objectives: Despite nearly universal prenatal ultrasound screening programs, congenital heart defects (CHD) are still missed, which may result in severe morbidity or even death. Deep machine learning (DL) can automate image recognition from ultrasound. The main aim of this study was to assess the performance of a previously developed DL model, trained on images from a tertiary center, using fetal ultrasound images obtained during the second-trimester standard anomaly scan in a low-risk population. A secondary aim was to compare initial screening diagnosis, which made use of live imaging at the point-of-care, with diagnosis by clinicians evaluating only stored images.

Methods: All pregnancies with isolated severe CHD in the Northwestern region of The Netherlands between 2015 and 2016 with available stored images were evaluated, as well as a sample of normal fetuses' examinations from the same region and time period. We compared the accuracy of the initial clinical diagnosis (made in real time with access to live imaging) with that of the model (which had only stored imaging available) and with the performance of three blinded human experts who had access only to the stored images (like the model). We analyzed performance according to ultrasound study characteristics, such as duration and quality (scored independently by investigators), number of stored images and availability of screening views.

Results: A total of 42 normal fetuses and 66 cases of isolated CHD at birth were analyzed. Of the abnormal cases, 31 were missed and 35 were detected at the time of the clinical anatomy scan (sensitivity, 53%). Model sensitivity and specificity were 91% and 78%, respectively. Blinded human experts (n = 3) achieved mean ± SD sensitivity and specificity of 55 ± 10% (range, 47-67%) and 71 ± 13% (range, 57-83%), respectively. There was a statistically significant difference in model correctness according to expert-graded image quality (P = 0.03). The abnormal cases included 19 lesions that the model had not encountered during its training; the model's performance in these cases (16/19 correct) was not statistically significantly different from that for previously encountered lesions (P = 0.41).

Conclusions: A previously trained DL algorithm had higher sensitivity than initial clinical assessment in detecting CHD in a cohort in which over 50% of CHD cases were initially missed clinically. Notably, the DL algorithm performed well on community-acquired images in a low-risk population, including lesions to which it had not been exposed previously. Furthermore, when both the model and blinded human experts had access to only stored images and not the full range of images available to a clinician during a live scan, the model outperformed the human experts. Together, these findings support the proposition that use of DL models can improve prenatal detection of CHD. © 2023 International Society of Ultrasound in Obstetrics and Gynecology.

Keywords: artificial intelligence; congenital heart disease; fetal screening; machine learning; ultrasound.

MeSH terms

  • Deep Learning*
  • Female
  • Heart Defects, Congenital* / diagnostic imaging
  • Heart Defects, Congenital* / epidemiology
  • Humans
  • Infant, Newborn
  • Pregnancy
  • Prenatal Diagnosis / methods
  • Sensitivity and Specificity
  • Ultrasonography, Prenatal / methods