Background: The Epidemiologic Registry of Cystic Fibrosis (ERCF) was a multicentre, longitudinal follow-up project of cystic fibrosis patients enrolled at some 200 centres in nine European countries between 1994 and 1999.
Purpose: We aimed to assess and improve the quality of a subset of data from the ERCF relating to seven English centres (1184 patients), prior to using the data for a long-term cost-effectiveness analysis of dornase alfa (Pulmozyme). Specifically we wanted to assess the completeness and accuracy of the data and the comparability of cases across centres.
Methods: We used a subset of ERCF data relating to seven UK cystic fibrosis (CF) centres. Following initial data editing, key variable data from a sample of patients from five centres were subjected to a detailed verification of ERCF data against original data sources available in the centres. Disagreements between ERCF reports and original data sources were identified and corrected in the study dataset. In addition, centre staff were questioned about relevant clinical and recording practices.
Results: Thanks to detailed routine data checking procedures on key variables operated by the ERCF, the rates of disagreement between ERCF data and original data as identified in our verification process on the assessed variables are generally low (0.4-3.7%). Some outcome variables (deaths, hospitalisations) seem to be under-reported by some centres. Episodes of pulmonary exacerbation are difficult to identify and also to verify. Twenty-four patients were registered twice (consecutively in two different centres). There were some differences between centres in their interpretation of recording rules.
Conclusions: Researchers seeking to use disease registry data should consider detailed data quality review processes. Apart from data accuracy, reliable definitions of both critical events as well as their timing are important. The degree of under-reporting, particularly of outcome variables, should be estimated. Information on local clinical and reporting practices is necessary to interpret multi-centre data. Data protection issues may limit the possibilities for detailed data quality assessments of secondary data, as does the accessibility of original data for verification purposes. Our experiences and recommendations may be valuable for those intending to use disease registry data as well as those devising and operating such registries.