A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival

Breast Cancer Res. 2020 Jan 28;22(1):12. doi: 10.1186/s13058-020-1248-3.


Background: Breast cancer intrinsic molecular subtype (IMS) as classified by the expression-based PAM50 assay is considered a strong prognostic feature, even when controlled for by standard clinicopathological features such as age, grade, and nodal status, yet the molecular testing required to elucidate these subtypes is not routinely performed. Furthermore, when such bulk assays as RNA sequencing are performed, intratumoral heterogeneity that may affect prognosis and therapeutic decision-making can be missed.

Methods: As a more facile and readily available method for determining IMS in breast cancer, we developed a deep learning approach for approximating PAM50 intrinsic subtyping using only whole-slide images of H&E-stained breast biopsy tissue sections. This algorithm was trained on images from 443 tumors that had previously undergone PAM50 subtyping to classify small patches of the images into four major molecular subtypes-Basal-like, HER2-enriched, Luminal A, and Luminal B-as well as Basal vs. non-Basal. The algorithm was subsequently used for subtype classification of a held-out set of 222 tumors.

Results: This deep learning image-based classifier correctly subtyped the majority of samples in the held-out set of tumors. However, in many cases, significant heterogeneity was observed in assigned subtypes across patches from within a single whole-slide image. We performed further analysis of heterogeneity, focusing on contrasting Luminal A and Basal-like subtypes because classifications from our deep learning algorithm-similar to PAM50-are associated with significant differences in survival between these two subtypes. Patients with tumors classified as heterogeneous were found to have survival intermediate between Luminal A and Basal patients, as well as more varied levels of hormone receptor expression patterns.

Conclusions: Here, we present a method for minimizing manual work required to identify cancer-rich patches among all multiscale patches in H&E-stained WSIs that can be generalized to any indication. These results suggest that advanced deep machine learning methods that use only routinely collected whole-slide images can approximate RNA-seq-based molecular tests such as PAM50 and, importantly, may increase detection of heterogeneous tumors that may require more detailed subtype analysis.

Keywords: Breast cancer; Deep learning algorithm; Intrinsic molecular subtype (IMS); Whole-slide imaging (WSI).

MeSH terms

  • Biomarkers, Tumor / genetics*
  • Breast Neoplasms / classification
  • Breast Neoplasms / genetics
  • Breast Neoplasms / mortality*
  • Breast Neoplasms / pathology*
  • Deep Learning*
  • Female
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Molecular Typing / methods*
  • Neoplasm Grading
  • Receptor, ErbB-2 / metabolism
  • Survival Rate


  • Biomarkers, Tumor
  • ERBB2 protein, human
  • Receptor, ErbB-2