A machine learning framework using urinary biomarkers for pancreatic ductal adenocarcinoma prediction with post hoc validation via single-cell transcriptomics

Brief Bioinform. 2025 Nov 1;26(6):bbaf583. doi: 10.1093/bib/bbaf583.

Abstract

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer with a poor prognosis, thus emphasizing the need for early and accurate diagnostic tools. In this study, we propose a comparative study approach to understand how machine learning (ML) modeling using urinary biomarkers combined with demographic data can predict PDAC. The study also utilized a single-cell RNA sequencing (scRNA-seq) analysis to assess and understand gene expressions of included biomarkers. With inclusion of available biomarkers and incorporation of demographic information, we employed different approaches for preprocessing techniques, normalization approaches, ML techniques, and deep learning (DL) approaches to provide a comprehensive prediction model. The scRNA-seq approach also highlighted the significance of the urinary biomarkers from the pancreatic single-cell sample. Based on this analysis, the marker was identified as one of the top three most highly expressed genes in PDAC tissues. The predictive modeling approach was conducted for both binary and multiclass classification using both ML and DL approaches. The comparative analysis using all included parameter combinations produced modeling settings, and among these parameters, the DL modeling approach using binary classification outperformed the other approaches by achieving 91% accuracy. This framework provided insights that highlighted the critical role of demographic data and potential approaches to include such features in the model without impacting the predictive accuracy. Future work will focus on examining the framework using different datasets, integrating additional omics data, and exploring advanced DL architectures to further improve predictive performances.

Keywords: biomarker discovery; deep learning; machine learning; pancreatic ductal adenocarcinoma (PDAC); predictive modeling; single-cell RNA sequencing (scRNA-seq); urinary biomarkers.

MeSH terms

  • Biomarkers, Tumor* / genetics
  • Biomarkers, Tumor* / urine
  • Carcinoma, Pancreatic Ductal* / diagnosis
  • Carcinoma, Pancreatic Ductal* / genetics
  • Carcinoma, Pancreatic Ductal* / urine
  • Female
  • Gene Expression Profiling
  • Humans
  • Machine Learning*
  • Male
  • Pancreatic Neoplasms* / diagnosis
  • Pancreatic Neoplasms* / genetics
  • Pancreatic Neoplasms* / urine
  • Single-Cell Analysis*
  • Transcriptome*

Substances

  • Biomarkers, Tumor