Comparison of cancer diagnosis recording between the Clinical Practice Research Datalink, Cancer Registry and Hospital Episodes Statistics

Cancer Epidemiol. 2018 Dec:57:148-157. doi: 10.1016/j.canep.2018.08.009. Epub 2018 Oct 2.


Introduction: The Clinical Practice Research Datalink (CPRD) is a large electronic dataset of primary care medical records. For the purpose of epidemiological studies, it is necessary to ensure accuracy and completeness of cancer diagnoses in CPRD.

Method: Cases included had a colorectal, oesophagogastric (OG), breast, prostate or lung cancer diagnosis recorded in a least one of CPRD, Cancer Registry (CR) or Hospital Episodes Statistics(HES) between 2000 and 2013. Agreement in diagnosis between the datasets, difference in dates, survival at one and five-years, and whether patient characteristics differed according to the dataset or the timing of diagnosis were investigated.

Results: 116,769 patients were included. For each cancer, approximately 10% of cases identified from CPRD or HES were not confirmed in the CR. 25.5% colorectal, 26.0% OG, 8.9% breast, 32.0% lung and 18.6% prostate cases identified from the CR were missing in CPRD. The diagnosis date was recorded later in CPRD compared with CR for each cancer, ranging from 81.1% for prostate to 59.6% for colorectal, especially if the diagnosis was an emergency. Compared with the CR and HES, the adjusted risk of a missing diagnosis in CPRD was significantly higher if the patient was older, had more co-morbidities or was diagnosed as an emergency. Survival at one and five-years was highest for CPRD.

Conclusion: Patient demographics and the route of diagnosis impact the accuracy of cancer diagnosis in CPRD. Although CPRD provides invaluable primary care data, patients should ideally be identified from the CR to reduce bias.

Keywords: Accuracy of diagnosis; Cancer registry; Clinical practice research datalink; Hospital episodes statistics; Survival.

Publication types

  • Comparative Study

MeSH terms

  • Adult
  • Data Collection / standards*
  • Databases, Factual* / standards
  • Female
  • Hospitals
  • Humans
  • Male
  • Medical Records* / standards
  • Middle Aged
  • Neoplasms / diagnosis*
  • Primary Health Care / statistics & numerical data
  • Registries* / standards