Performance of a Screening Mammography AI Algorithm Repurposed for Symptomatic Mammography in a Tertiary Outpatient Clinic

Diagnostics (Basel). 2026 Mar 25;16(7):984. doi: 10.3390/diagnostics16070984.

Abstract

Background/Objectives: The aim of the study was to evaluate the diagnostic accuracy of a commercial artificial intelligence (AI) algorithm originally developed for screening mammography when applied to symptomatic women presenting to a tertiary outpatient clinic. Methods: This single-center, retrospective diagnostic accuracy study included women who presented with breast symptoms to a tertiary outpatient clinic between January and June 2013 and underwent digital mammography. An AI algorithm cleared by the U.S. Food and Drug Administration (FDA)-cleared AI algorithm was applied to all mammograms and generated continuous malignancy scores ranging from 1 to 100. Mammographic breast density was classified according to the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) by two experienced radiologists. Histopathology, when available, or otherwise a minimum of 2 years of clinical and imaging follow-up served as the reference standard. Diagnostic performance was assessed using receiver operating characteristic (ROC) analysis with calculation of the area under the curve (AUC) and 95% confidence intervals (CI) derived by patient level bootstrap resampling (n = 2000). Analyses were performed for the overall cohort and stratified by breast density (non-dense [BI-RADS A-B] vs. dense [BI-RADS C-D]). Results: A total of 78 women (mean age, 55 ± 11 years) were included, of whom 16 had histopathological verification of suspicious lesions with proven breast cancer in 14 patients and 62 were classified based on follow-up alone. In the overall cohort (156 breasts, including 15 breasts with malignancies), the AI algorithm achieved an AUC of 0.96 (95% CI: 0.86-1.00). Performance remained high in non-dense breasts (AUC = 0.96; 95% CI: 0.88-1.00) and dense breasts (AUC = 0.99; 95% CI: 0.93-1.00), with no statistically significant difference observed between density subgroups (DeLong test, p = 0.36), although subgroup comparisons were underpowered. Decision curve analysis suggested a consistent positive net benefit across a wide range of threshold probabilities in both density groups. Conclusions: In this preliminary, single-center retrospective cohort, a screening-trained AI algorithm showed promising diagnostic accuracy when applied to symptomatic mammograms. These findings require validation in larger, contemporary, multicenter cohorts before clinical implementation.

Keywords: artificial intelligence; breast cancer; breast density; computer-aided detection; deep learning; diagnostic imaging; mammography; symptomatic.

Grants and funding