Can routine data be used to support cancer clinical trials? A historical baseline on which to build: retrospective linkage of data from the TACT (CRUK 01/001) breast cancer trial and the National Cancer Data Repository

Trials. 2017 Nov 23;18(1):561. doi: 10.1186/s13063-017-2308-6.


Background: Randomised clinical trials (RCTs) are the gold standard for evaluating new cancer treatments. They are, however, expensive to conduct, particularly where long-term follow-up of participants is required. Tracking participants via routine datasets could provide a cost-effective alternative for ascertaining follow-up information required to evaluate disease outcomes. This project explores the potential for routine data to inform cancer trials, using, the historical National Cancer Data Repository (NCDR) for English NHS sites and, for validation, mature data available from the TACT trial.

Methods: Datasets were matched using patients' NHS number, date of birth (dob) and name/initials. Demographics, clinical characteristics and outcomes were assessed for agreement and completeness. Overall survival was compared between NCDR and TACT.

Results: A total of 3151 patients underwent linkage; 3047 (96.7%) of which had matched records. Extensive cleaning was required for some registry data fields, e.g. cause of death, whilst others had large amounts of missing data, e.g. tumour size (22.1%). Other data had high levels of matching such as dob (99.6%) and date of death (89.6%). There was no evidence of differential survival rates (8-year survival: TACT = 75% (95% CI 73, 76); NCDR = 76% (95% CI 74, 77)).

Conclusions: Data quality and completeness requires improvement before routine data could be used for RCTs. Introduction of new routine datasets, including COSD, is welcomed although reporting of disease-recurrence events remains a concern. Prospective validation of such datasets is required before RCTs can confidently switch patient follow-up to utilise routinely collected NHS-based data.

Tact trial registration: NCT00033683 , registered on 9 April 2002; ISRCTN79718493 , registered on 1 July 2001.

Keywords: Cancer trials; Randomised controlled trial; Routine data linkage; Validation.

Publication types

  • Comparative Study

MeSH terms

  • Antineoplastic Combined Chemotherapy Protocols / adverse effects
  • Antineoplastic Combined Chemotherapy Protocols / therapeutic use*
  • Breast Neoplasms / drug therapy*
  • Breast Neoplasms / mortality
  • Breast Neoplasms / pathology
  • Data Mining / methods*
  • Databases, Factual
  • Disease Progression
  • Disease-Free Survival
  • Female
  • Health Services Research / methods*
  • Humans
  • Medical Record Linkage / methods*
  • Randomized Controlled Trials as Topic / methods*
  • Registries*
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Factors
  • State Medicine
  • Time Factors
  • Treatment Outcome
  • United Kingdom

Associated data