Weakly supervised deep learning for cutaneous squamous and basal cell carcinoma in whole-slide histopathology

J Pathol Clin Res. 2026 Mar;12(2):e70082. doi: 10.1002/2056-4538.70082.

Abstract

Distinguishing infiltrative basal cell carcinoma (BCC) from poorly differentiated cutaneous squamous cell carcinoma (cSCC) remains a significant histopathological challenge. Automated deep learning approaches hold promise for improving diagnostic reliability, yet robust external validation is essential. In this study, we developed a weakly supervised deep learning model to classify these diagnostically challenging subtypes and evaluated its generalizability across internal and external cohorts, as well as in comparison to a dermatopathology foundation model (HistoGPT). The model employed a multiple-instance learning framework (CLAM) using the histopathology-specific transformer Phikon for feature extraction from whole-slide images. Slide-level ground-truth diagnoses from the collected images (n = 335, University Hospital Erlangen) were derived from routine clinical practice and re-evaluated by two board-certified dermatopathologists. Performance was assessed on an internal test set of 84 whole-slide images (27 cSCC and 57 BCC) and two external datasets: Queensland cohort (n = 10, curated in-distribution cases) and the COBRA cohort (n = 200, broad, partly out-of-distribution cases). Model discrimination was quantified using ROC curves, while accuracy, sensitivity, and specificity were reported alongside 95% Wilson confidence intervals (CIs). On the internal test set, the model achieved perfect classification [area under the receiver operating characteristic (AUC) = 1.0; 100% accuracy, sensitivity, and specificity]. Similarly, strong performance was observed in the Queensland cohort (AUC = 1.0), although limited by sample size. In the more heterogeneous COBRA cohort, discrimination remained high (AUC = 0.923, 95% CI 0.885-0.961), requiring threshold adjustment to correct for marked calibration shift (balanced accuracy 86.5% at Youden's J). Attention heatmaps highlighted histologically meaningful regions. In zero-shot evaluation on the internal test set, HistoGPT achieved an overall accuracy of 77%, with high class-wise sensitivity for BCC (98%, 95% CI 91-100) but markedly reduced sensitivity for cSCC (33%, 95% CI 19-52). Fine-tuning a task-specific classifier on the HistoGPT backbone substantially improved performance, achieving near-perfect discrimination and 98% balanced accuracy. These findings demonstrate that weakly supervised deep learning enables highly accurate classification of diagnostically challenging BCC and cutaneous squamous cell carcinoma subtypes. However, reliable deployment across institutions necessitates careful calibration and domain adaptation, and even powerful foundation models such as HistoGPT benefit from targeted fine-tuning to ensure robust performance in dermatopathology.

Keywords: artificial intelligence; basal cell carcinoma; clinical pathology; computer‐assisted image interpretation; deep learning; skin neoplasms; squamous cell carcinoma.

MeSH terms

  • Basal Cell Carcinoma* / diagnosis
  • Basal Cell Carcinoma* / pathology
  • Carcinoma, Squamous Cell* / diagnosis
  • Carcinoma, Squamous Cell* / pathology
  • Deep Learning*
  • Diagnosis, Differential
  • Humans
  • Image Interpretation, Computer-Assisted* / methods
  • Reproducibility of Results
  • Skin Neoplasms* / classification
  • Skin Neoplasms* / diagnosis
  • Skin Neoplasms* / pathology