Population-Based Registry Linkages to Improve Validity of Electronic Health Record-Based Cancer Research

Cancer Epidemiol Biomarkers Prev. 2020 Apr;29(4):796-806. doi: 10.1158/1055-9965.EPI-19-0882. Epub 2020 Feb 17.


Background: There is tremendous potential to leverage the value gained from integrating electronic health records (EHR) and population-based cancer registry data for research. Registries provide diagnosis details, tumor characteristics, and treatment summaries, while EHRs contain rich clinical detail. A carefully conducted cancer registry linkage may also be used to improve the internal and external validity of inferences made from EHR-based studies.

Methods: We linked the EHRs of a large, multispecialty, mixed-payer health care system with the statewide cancer registry and assessed the validity of our linked population. For internal validity, we identify patients that might be "missed" in a linkage, threatening the internal validity of an EHR study population. For generalizability, we compared linked cases with all other cancer patients in the 22-county EHR catchment region.

Results: From an EHR population of 4.5 million, we identified 306,554 patients with cancer, 26% of the catchment region patients with cancer; 22.7% of linked patients were diagnosed with cancer after they migrated away from our health care system highlighting an advantage of system-wide linkage. We observed demographic differences between EHR patients and non-EHR patients in the surrounding region and demonstrated use of selection probabilities with model-based standardization to improve generalizability.

Conclusions: Our experiences set the foundation to encourage and inform researchers interested in working with EHRs for cancer research as well as provide context for leveraging linkages to assess and improve validity and generalizability.

Impact: Researchers conducting linkages may benefit from considering one or more of these approaches to establish and evaluate the validity of their EHR-based populations.See all articles in this CEBP Focus section, "Modernizing Population Science."

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Data Accuracy*
  • Electronic Health Records / statistics & numerical data*
  • Humans
  • Neoplasms / epidemiology*
  • Registries / statistics & numerical data*
  • Reproducibility of Results
  • Validation Studies as Topic