Spatial Mapping of Gene Signatures in Hematoxylin and Eosin-Stained Images: A Proof of Concept for Interpretable Predictions Using Additive Multiple Instance Learning

Mod Pathol. 2025 Aug;38(8):100772. doi: 10.1016/j.modpat.2025.100772. Epub 2025 Apr 11.

Abstract

The relative abundance of cancer-associated fibroblast (CAF) subtypes influences a tumor's response to treatment, especially immunotherapy. However, the gene expression signatures associated with these CAF subtypes have yet to realize their potential as clinical biomarkers. Here, we describe an interpretable machine learning approach, additive multiple instance learning (aMIL), to predict bulk gene expression signatures from hematoxylin and eosin-stained whole-slide images, focusing on an immunosuppressive LRRC15+ CAF-enriched TGFβ-CAF signature. aMIL models accurately predicted TGFβ-CAF across various cancer types. Tissue regions contributing most highly to slide-level predictions of TGFβ-CAF were evaluated by machine learning models characterizing spatial distributions of diverse cell and tissue types, stromal subtypes, and nuclear morphology. In breast cancer, regions contributing most to TGFβ-CAF-high predictions ("excitatory") were localized to cancer stroma with high fibroblast density and mature collagen fibers. Regions contributing most to TGFβ-CAF-low predictions ("inhibitory") were localized to cancer epithelium and densely inflamed stroma. Fibroblast and lymphocyte nuclear morphology also differed between excitatory and inhibitory regions. Thus, aMIL enables a data-driven link between histologic features and transcription, offering biological interpretability beyond typical black-box models.

Keywords: biomarkers; computational pathology; gene expression signatures; machine learning.

MeSH terms

  • Biomarkers, Tumor* / genetics
  • Breast Neoplasms* / genetics
  • Breast Neoplasms* / pathology
  • Cancer-Associated Fibroblasts* / metabolism
  • Cancer-Associated Fibroblasts* / pathology
  • Eosine Yellowish-(YS)
  • Female
  • Gene Expression Profiling
  • Hematoxylin
  • Humans
  • Machine Learning*
  • Multiple-Instance Learning Algorithms
  • Proof of Concept Study
  • Staining and Labeling
  • Transcriptome*
  • Transforming Growth Factor beta / metabolism

Substances

  • Hematoxylin
  • Eosine Yellowish-(YS)
  • Biomarkers, Tumor
  • Transforming Growth Factor beta