Deep Learning vs Traditional Breast Cancer Risk Models to Support Risk-Based Mammography Screening

Constance D Lehman; Sarah Mercaldo; Leslie R Lamb; Tari A King; Leif W Ellisen; Michelle Specht; Rulla M Tamimi

doi:10.1093/jnci/djac142

Deep Learning vs Traditional Breast Cancer Risk Models to Support Risk-Based Mammography Screening

J Natl Cancer Inst. 2022 Oct 6;114(10):1355-1363. doi: 10.1093/jnci/djac142.

Authors

Constance D Lehman^{1

2}, Sarah Mercaldo^{1

2}, Leslie R Lamb^{1

2}, Tari A King^{3

4}, Leif W Ellisen^{1

5}, Michelle Specht^{1

3}, Rulla M Tamimi⁶

Affiliations

¹ Massachusetts General Hospital, Boston, MA, USA.
² Harvard Medical School, Radiology, Boston, MA, USA.
³ Harvard Medical School, Surgery, Boston, MA, USA.
⁴ Dana-Farber/Brigham and Women's Cancer Center, Boston, MA, USA.
⁵ Harvard Medical School, Medicine, Boston, MA, USA.
⁶ Weill Cornell Medicine, Epidemiology and Population Health Sciences, New York, NY, USA.

Abstract

Background: Deep learning breast cancer risk models demonstrate improved accuracy compared with traditional risk models but have not been prospectively tested. We compared the accuracy of a deep learning risk score derived from the patient's prior mammogram to traditional risk scores to prospectively identify patients with cancer in a cohort due for screening.

Methods: We collected data on 119 139 bilateral screening mammograms in 57 617 consecutive patients screened at 5 facilities between September 18, 2017, and February 1, 2021. Patient demographics were retrieved from electronic medical records, cancer outcomes determined through regional tumor registry linkage, and comparisons made across risk models using Wilcoxon and Pearson χ2 2-sided tests. Deep learning, Tyrer-Cuzick, and National Cancer Institute Breast Cancer Risk Assessment Tool (NCI BCRAT) risk models were compared with respect to performance metrics and area under the receiver operating characteristic curves.

Results: Cancers detected per thousand patients screened were higher in patients at increased risk by the deep learning model (8.6, 95% confidence interval [CI] = 7.9 to 9.4) compared with Tyrer-Cuzick (4.4, 95% CI = 3.9 to 4.9) and NCI BCRAT (3.8, 95% CI = 3.3 to 4.3) models (P < .001). Area under the receiver operating characteristic curves of the deep learning model (0.68, 95% CI = 0.66 to 0.70) was higher compared with Tyrer-Cuzick (0.57, 95% CI = 0.54 to 0.60) and NCI BCRAT (0.57, 95% CI = 0.54 to 0.60) models. Simulated screening of the top 50th percentile risk by the deep learning model captured statistically significantly more patients with cancer compared with Tyrer-Cuzick and NCI BCRAT models (P < .001).

Conclusions: A deep learning model to assess breast cancer risk can support feasible and effective risk-based screening and is superior to traditional models to identify patients destined to develop cancer in large screening cohorts.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / diagnostic imaging
Breast Neoplasms* / epidemiology
Deep Learning*
Early Detection of Cancer / methods
Female
Humans
Mammography / methods
Risk Assessment / methods