Real-world evidence from Germany: representativeness analysis and mortality endpoint validation in electronic health record-derived oncology cohorts

BMC Cancer. 2026 Jan 10. doi: 10.1186/s12885-026-15548-8. Online ahead of print.

Abstract

Background: High-quality real-world data (RWD) for oncology research remains limited in Germany despite significant clinical need. Electronic health record (EHR)-derived datasets offer potential to capture longitudinal clinical information, but their value depends on representativeness and data quality. We characterized EHR-derived oncology cohorts from Germany, evaluating alignment with national benchmarks and validating mortality endpoints for real-world evidence generation.

Methods: We analyzed deidentified EHR data from the Germany Flatiron Health Research Database comprising adult patients diagnosed with breast cancer, non-small cell lung cancer (NSCLC), or colorectal cancer (CRC) between 2016 and 2024. Demographic and clinical characteristics were compared to national benchmarks. Overall survival was estimated using Kaplan-Meier methods. We validated our composite mortality variable against German Cancer Registry data.

Results: The breast cancer cohort (n = 1,305, median age 58 years) included 75% early-stage disease, 80% invasive ductal carcinoma, 67% HR+/HER2-, and 19% triple-negative cases. The NSCLC cohort (n = 866, median age 69 years) comprised 49% stage IV disease, 73% non-squamous histology, and 16% EGFR-positive tumors among tested patients. The CRC cohort (n = 774, median age 67 years) included 31% stage IV disease with 90% receiving primary surgery. Cohort characteristics closely aligned with national benchmarks. Median overall survival from first-line therapy was 26.0 months (breast), 13.4 months (NSCLC), and 21.5 months (CRC). Mortality validation demonstrated 87.7% sensitivity and 91.7% specificity for vital status classification, with 98.8% temporal accuracy within 30 days.

Conclusions: German EHR-derived cancer cohorts are representative of national populations with validated mortality endpoints, supporting their use for robust real-world evidence generation in oncology research.

Keywords: Electronic health records; Epidemiology; Germany; Real-world data; Representativeness.