Objectives: The Hospital Episode Statistics (HES) dataset is a source of administrative 'big data' with potential for costing purposes in economic evaluations alongside clinical trials. This study assesses the validity of coverage in the HES outpatient dataset.
Methods: Men who died of, or with, prostate cancer were selected from a prostate-cancer screening trial (CAP, Cluster randomised triAl of PSA testing for Prostate cancer). Details of visits that took place after 1/4/2003 to hospital outpatient departments for conditions related to prostate cancer were extracted from medical records (MR); these appointments were sought in the HES outpatient dataset based on date. The matching procedure was repeated for periods before and after 1/4/2008, when the HES outpatient dataset was accredited as a national statistic.
Results: 4922 outpatient appointments were extracted from MR for 370 men. 4088 appointments recorded in MR were identified in the HES outpatient dataset (83.1%; 95% confidence interval [CI] 82.0-84.1). For appointments occurring prior to 1/4/2008, 2195/2755 (79.7%; 95% CI 78.2-81.2) matches were observed, while 1893/2167 (87.4%; 95% CI 86.0-88.9) appointments occurring after 1/4/2008 were identified (p for difference <0.001). 215/370 men (58.1%) had at least one appointment in the MR review that was unmatched in HES, 155 men (41.9%) had all their appointments identified, and 20 men (5.4%) had no appointments identified in HES.
Conclusions: The HES outpatient dataset appears reasonably valid for research, particularly following accreditation. The dataset may be a suitable alternative to collecting MR data from hospital notes within a trial, although caution should be exercised with data collected prior to accreditation.