Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis

BMC Bioinformatics. 2025 Oct 6;26(1):239. doi: 10.1186/s12859-025-06257-3.

Abstract

Background: Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.

Results: Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.

Conclusions: Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.

Keywords: Cancer gene markers; Cancer type prediction; Deep learning; Explainability; Genomics; Mutational profiles; One-shot learning.

MeSH terms

  • Gene Expression Profiling
  • Genomics* / methods
  • Humans
  • Machine Learning*
  • Mutation*
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics
  • Neural Networks, Computer