A multi-representation deep-learning framework for accurate multicancer classification

J Transl Med. 2025 Nov 19;23(1):1317. doi: 10.1186/s12967-025-07325-1.

Abstract

Background: Accurate multicancer classification constitutes a cornerstone of modern oncology, offering critical insights into diagnosis, therapeutic decision-making, and prognostication. Numerous existing approaches, however, remain restricted to limited cancer types and typically encode genomic information into a single representational modality. The purpose of this study was to develop and evaluate a novel framework by integrating complementary, mutation-derived features to advance cancer classification.

Methods: We present GraphVar, a multi-representation deep learning framework that integrates mutation-derived imaging and numeric genomic features for multicancer classification. GraphVar generates a spatial variant map by encoding gene-level variant categories as pixel intensities. In parallel, it constructs a numeric feature matrix capturing population allele frequencies and mutation spectra. GraphVar employs a ResNet-18 backbone to extract image-level features, a Transformer encoder to model numeric profiles, and a fusion module to integrate both modalities. Model interpretability was assessed by gradient-weighted class activation mapping (Grad-CAM), and functional relevance was validated utilizing the Kyoto Encyclopedia of Genes and Genomes (KEGG)-based pathway enrichment analysis.

Results: In a cohort of 10,112 patients spanning 33 cancer types, GraphVar achieved a precision of 99.85%, a recall of 99.82%, an F1-score of 99.82%, and an accuracy of 99.82%. Grad-CAM highlighted the model's ability to localize gene-level molecular patterns and prioritize biologically relevant candidates. The KEGG-based pathway enrichment analysis of kidney renal clear cell carcinoma (KIRC) and breast invasive carcinoma (BRCA) samples supported the biological relevance of GraphVar-identified genes, demonstrating its capacity to capture functionally meaningful genomic signatures.

Conclusions: These findings demonstrate GraphVar as a robust and interpretable framework for multicancer classification. The model's high accuracy and its ability to identify functionally meaningful genomic signatures indicate its potential as a tool to support precision diagnostics and therapeutic strategies, warranting further translational studies.

Keywords: Deep learning; Multi-representation; Transformer; Variant.

MeSH terms

  • Deep Learning*
  • Genomics
  • Humans
  • Mutation / genetics
  • Neoplasms* / classification
  • Neoplasms* / genetics