Early prediction of prostate cancer risk in younger men using polygenic risk scores and electronic health records

Cancer Med. 2023 Jan;12(1):379-386. doi: 10.1002/cam4.4934. Epub 2022 Jun 25.


Background: Prostate cancer (PCa) screening is not routinely conducted in men aged 55 and younger, although this age group accounts for more than 10% of cases. Polygenic risk scores (PRSs) and patient data applied toward early prediction of PCa may lead to earlier interventions and increased survival. We have developed machine learning (ML) models to predict PCa risk in men 55 and under using PRSs combined with patient data.

Methods: We conducted a retrospective study on 91,106 male patients aged 35-55 using the UK Biobank database. Five gradient boosting models were developed and validated utilizing routine screening data, PRSs, additional clinical data, or combinations of the three.

Results: Combinations of PRSs and patient data outperformed models that utilized PRS or patient data only, and the highest performing models achieved an area under the receiver operating characteristic curve of 0.788. Our models demonstrated a substantially lower false positive rate (35.4%) in comparison to standard screening using prostate-specific antigen (60%-67%).

Conclusion: This study provides the first preliminary evidence for the use of PRSs with patient data in a ML algorithm for PCa risk prediction in men aged 55 and under for whom screening is not standard practice.

Keywords: algorithms; decision support; machine learning; polygenic risk score; prostate cancer.

MeSH terms

  • Adult
  • Databases, Factual
  • Electronic Health Records
  • Humans
  • Male
  • Middle Aged
  • Predictive Value of Tests
  • Prostatic Neoplasms* / epidemiology
  • Prostatic Neoplasms* / genetics
  • Retrospective Studies
  • Risk Assessment / methods
  • Risk Factors