Validation Of Cancer Diagnoses In Electronic Health Records: Results From The Information System For Research In Primary Care (SIDIAP) In Northeast Spain

Clin Epidemiol. 2019 Dec 3;11:1015-1024. doi: 10.2147/CLEP.S225568. eCollection 2019.


Background: Electronic health records are becoming an increasingly valuable resource for epidemiology but their data quality needs to be quantified. We aimed to validate twenty-five types of incident cancer cases in the Information System for Research in Primary Care (SIDIAP) in Catalonia with the population-based cancer registries of Girona and Tarragona as the gold-standard.

Methods: We calculated the sensitivity, positive predictive values (PPV), and the time-difference between the date of diagnosis entered into the SIDIAP and into the registries. We added hospital discharge cancer diagnoses to the SIDIAP to assess sensitivity changes.

Results: We identified 27,046 incident cancer diagnoses in the SIDIAP from 2009-2015 among the 949,841 residents of Girona and Tarragona. The cancer types with the highest sensitivity were breast (89%, 95% CI: 88-90%), colorectal (81%, 95% CI: 80-82%), and prostate (81%, 95% CI: 80-83%). Trachea, bronchus and lung cancers had the highest PPV (76%, 95% CI: 74%-78%) followed by stomach (72%, 95% CI: 68-75%) and pancreas (71%, 95% CI: 67-75%). Most cancer diagnoses were reported with less than three months of difference between the SIDIAP and the registries. More cases were registered first in the registries than in the SIDIAP. By adding cancer diagnoses based on hospital discharge data, sensitivity increased for all cancers, especially for gallbladder and biliary tract for which the sensitivity increased by 21%.

Conclusion: The SIDIAP includes 76% of the cancer diagnoses in the cancer registries but includes a considerable number of cases that are not in the registries. The SIDIAP reports most of the cancer diagnoses within a three-month period difference from the date of diagnosis in the cancer registries. Our results support the use of the SIDIAP cancer diagnoses for epidemiological research when cancer is the outcome of interest. We recommend adding hospital discharge data to the SIDIAP to increase data quality, particularly for less frequent cancer types.

Keywords: cancer; electronic health records; population-based cancer registries; primary health care; validation studies.

Grant support

Funding [for grant number: 2017/1630] was obtained from Wereld Kanker Onderzoek Fonds (WKOF), as part of the World Cancer Research Fund International grant program. TDS is funded by the Department of Health of the Generalitat de Catalunya, awarded on the 2016 call under the Strategic Plan for Research and Innovation in Health (PERIS) 2016–2020, modality incorporation of scientists and technologists, with reference SLT002/16/00308.