Development of a Longitudinal Prostate Cancer Transcriptomic and Clinical Data Linkage

JAMA Netw Open. 2024 Jun 3;7(6):e2417274. doi: 10.1001/jamanetworkopen.2024.17274.

Abstract

Importance: Although tissue-based gene expression testing has become widely used for prostate cancer risk stratification, its prognostic performance in the setting of clinical care is not well understood.

Objective: To develop a linkage between a prostate genomic classifier (GC) and clinical data across payers and sites of care in the US.

Design, setting, and participants: In this cohort study, clinical and transcriptomic data from clinical use of a prostate GC between 2016 and 2022 were linked with data aggregated from insurance claims, pharmacy records, and electronic health record (EHR) data. Participants were anonymously linked between datasets by deterministic methods through a deidentification engine using encrypted tokens. Algorithms were developed and refined for identifying prostate cancer diagnoses, treatment timing, and clinical outcomes using diagnosis codes, Common Procedural Terminology codes, pharmacy codes, Systematized Medical Nomenclature for Medicine clinical terms, and unstructured text in the EHR. Data analysis was performed from January 2023 to January 2024.

Exposure: Diagnosis of prostate cancer.

Main outcomes and measures: The primary outcomes were biochemical recurrence and development of prostate cancer metastases after diagnosis or radical prostatectomy (RP). The sensitivity of the linkage and identification algorithms for clinical and administrative data were calculated relative to clinical and pathological information obtained during the GC testing process as the reference standard.

Results: A total of 92 976 of 95 578 (97.2%) participants who underwent prostate GC testing were successfully linked to administrative and clinical data, including 53 871 who underwent biopsy testing and 39 105 who underwent RP testing. The median (IQR) age at GC testing was 66.4 (61.0-71.0) years. The sensitivity of the EHR linkage data for prostate cancer diagnoses was 85.0% (95% CI, 84.7%-85.2%), including 80.8% (95% CI, 80.4%-81.1%) for biopsy-tested participants and 90.8% (95% CI, 90.5%-91.0%) for RP-tested participants. Year of treatment was concordant in 97.9% (95% CI, 97.7%-98.1%) of those undergoing GC testing at RP, and 86.0% (95% CI, 85.6%-86.4%) among participants undergoing biopsy testing. The sensitivity of the linkage was 48.6% (95% CI, 48.1%-49.1%) for identifying RP and 50.1% (95% CI, 49.7%-50.5%) for identifying prostate biopsy.

Conclusions and relevance: This study established a national-scale linkage of transcriptomic and longitudinal clinical data yielding high accuracy for identifying key clinical junctures, including diagnosis, treatment, and early cancer outcome. This resource can be leveraged to enhance understandings of disease biology, patterns of care, and treatment effectiveness.

MeSH terms

  • Aged
  • Algorithms
  • Cohort Studies
  • Electronic Health Records / statistics & numerical data
  • Humans
  • Information Storage and Retrieval
  • Longitudinal Studies
  • Male
  • Middle Aged
  • Prostatectomy
  • Prostatic Neoplasms* / diagnosis
  • Prostatic Neoplasms* / genetics
  • Prostatic Neoplasms* / pathology
  • Transcriptome* / genetics