Developing ultrasound-based machine learning models for accurate differentiation between sclerosing adenosis and invasive ductal carcinoma

Eur Radiol. 2026 Jan;36(1):33-44. doi: 10.1007/s00330-025-11777-w. Epub 2025 Jun 28.

Abstract

Objective: This study aimed to develop a machine learning model using breast ultrasound images to improve the non-invasive differential diagnosis between Sclerosing Adenosis (SA) and Invasive Ductal Carcinoma (IDC).

Materials and methods: 2046 ultrasound images from 772 SA and IDC patients were collected, Regions of Interest (ROI) were delineated, and features were extracted. The dataset was split into training and test cohorts, and feature selection was performed by correlation coefficients and Recursive Feature Elimination. 10 classifiers with Grid Search and 5-fold cross-validation were applied during model training. Receiver Operating Characteristic (ROC) curve and Youden index were used to model evaluation. SHapley Additive exPlanations (SHAP) was employed for model interpretation. Another 224 ROIs of 84 patients from other hospitals were used for external validation.

Results: For the ROI-level model, XGBoost with 18 features achieved an area under the curve (AUC) of 0.9758 (0.9654-0.9847) in the test cohort and 0.9906 (0.9805-0.9973) in the validation cohort. For the patient-level model, logistic regression with 9 features achieved an AUC of 0.9653 (0.9402-0.9859) in the test cohort and 0.9846 (0.9615-0.9978) in the validation cohort. The feature "Original shape Major Axis Length" was identified as the most important, with its value positively correlated with a higher likelihood of the sample being IDC. Feature contributions for specific ROIs were visualized as well.

Conclusion: We developed explainable, ultrasound-based machine learning models with high performance for differentiating SA and IDC, offering a potential non-invasive tool for improved differential diagnosis.

Key points: Question Accurately distinguishing between sclerosing adenosis (SA) and invasive ductal carcinoma (IDC) in a non-invasive manner has been a diagnostic challenge. Findings Explainable, ultrasound-based machine learning models with high performance were developed for differentiating SA and IDC, and validated well in external validation cohort. Critical relevance These models provide non-invasive tools to reduce misdiagnoses of SA and improve early detection for IDC.

Keywords: Differential diagnosis; Invasive ductal carcinoma; Machine learning; Sclerosing adenosis; Ultrasound images.

MeSH terms

  • Adult
  • Aged
  • Breast Neoplasms* / diagnostic imaging
  • Carcinoma, Ductal, Breast* / diagnostic imaging
  • Diagnosis, Differential
  • Female
  • Fibrocystic Breast Disease* / diagnostic imaging
  • Humans
  • Image Interpretation, Computer-Assisted / methods
  • Machine Learning*
  • Middle Aged
  • Ultrasonography, Mammary* / methods