A whole slide image-based machine learning approach to predict ductal carcinoma in situ (DCIS) recurrence risk

Breast Cancer Res. 2019 Jul 29;21(1):83. doi: 10.1186/s13058-019-1165-5.


Background: Breast ductal carcinoma in situ (DCIS) represent approximately 20% of screen-detected breast cancers. The overall risk for DCIS patients treated with breast-conserving surgery stems almost exclusively from local recurrence. Although a mastectomy or adjuvant radiation can reduce recurrence risk, there are significant concerns regarding patient over-/under-treatment. Current clinicopathological markers are insufficient to accurately assess the recurrence risk. To address this issue, we developed a novel machine learning (ML) pipeline to predict risk of ipsilateral recurrence using digitized whole slide images (WSI) and clinicopathologic long-term outcome data from a retrospectively collected cohort of DCIS patients (n = 344) treated with lumpectomy at Nottingham University Hospital, UK.

Methods: The cohort was split case-wise into training (n = 159, 31 with 10-year recurrence) and validation (n = 185, 26 with 10-year recurrence) sets. The sections from primary tumors were stained with H&E, then digitized and analyzed by the pipeline. In the first step, a classifier trained manually by pathologists was applied to digital slides to annotate the areas of stroma, normal/benign ducts, cancer ducts, dense lymphocyte region, and blood vessels. In the second step, a recurrence risk classifier was trained on eight select architectural and spatial organization tissue features from the annotated areas to predict recurrence risk.

Results: The recurrence classifier significantly predicted the 10-year recurrence risk in the training [hazard ratio (HR) = 11.6; 95% confidence interval (CI) 5.3-25.3, accuracy (Acc) = 0.87, sensitivity (Sn) = 0.71, and specificity (Sp) = 0.91] and independent validation [HR = 6.39 (95% CI 3.0-13.8), p < 0.0001;Acc = 0.85, Sn = 0.5, Sp = 0.91] cohorts. Despite the limitations of our cohorts, and in some cases inferior sensitivity performance, our tool showed superior accuracy, specificity, positive predictive value, concordance, and hazard ratios relative to tested clinicopathological variables in predicting recurrences (p < 0.0001). Furthermore, it significantly identified patients that might benefit from additional therapy (validation cohort p = 0.0006).

Conclusions: Our machine learning-based model fills an unmet clinical need for accurately predicting the recurrence risk for lumpectomy-treated DCIS patients.

Keywords: Biomarker; DCIS; Digital image analysis; Machine learning; Prognosis; Recurrence prediction.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Biomarkers, Tumor*
  • Breast Neoplasms / metabolism*
  • Breast Neoplasms / mortality
  • Breast Neoplasms / pathology*
  • Breast Neoplasms / therapy
  • Carcinoma, Intraductal, Noninfiltrating / metabolism*
  • Carcinoma, Intraductal, Noninfiltrating / pathology*
  • Carcinoma, Intraductal, Noninfiltrating / therapy
  • Female
  • Humans
  • Immunohistochemistry*
  • Machine Learning*
  • Mastectomy
  • Middle Aged
  • Neoplasm Grading
  • Neoplasm Recurrence, Local
  • Neoplasm Staging
  • Prognosis
  • Proportional Hazards Models
  • Risk Assessment


  • Biomarkers, Tumor