Cancer incidence in The Health Improvement Network

Pharmacoepidemiol Drug Saf. 2009 Aug;18(8):730-6. doi: 10.1002/pds.1774.


Background: The utility of electronic medical record databases for clinical research relies on the validity and completeness of the recorded medical diagnoses. This study assessed whether the recorded incidence of cancer among patients in The Health Improvement Network (THIN) database is comparable to that expected in the UK based on national cancer registry data.

Methods: We examined incidence rates of any cancer other than non-melanoma skin cancer and the specific cancers colorectal, lung, pancreas, and lymphoma from 1992 to 2007. Indirect standardization was used to calculate standardized incidence ratios (SIR) using age- and sex-specific rates from the UK cancer registry for England and Wales for the corresponding years.

Results: Recording of the incidence of all cancers combined in THIN was very close to the expected rates from 2001 to 2007, that is, SIR within 10% of unity. Recording of the solid cancers was less than the expected based on cancer registry data, but with SIRs > 0.80 in 2007 for each cancer. Recording of lymphoma was close to the expected rate for the entire follow-up period. Time and experience with Vision software emerged as important factors in reported incidence rates for all cancers.

Conclusions: For all cancers combined and for lymphoma the observed rates in THIN are very close to those reported in cancer registry data for the years 2001-2007. However, for solid cancers the observed rates in THIN are below those reported in cancer registry data. This may reflect the use of non-specific codes to record solid cancers.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Age Distribution
  • Aged
  • Aged, 80 and over
  • Child
  • Child, Preschool
  • Epidemiologic Research Design
  • Female
  • Humans
  • Incidence
  • Infant
  • Infant, Newborn
  • International Classification of Diseases
  • Male
  • Medical Records Systems, Computerized / standards
  • Medical Records Systems, Computerized / statistics & numerical data*
  • Middle Aged
  • Neoplasms / epidemiology*
  • Registries / standards
  • Registries / statistics & numerical data*
  • Reproducibility of Results
  • Sex Distribution
  • Time Factors
  • United Kingdom / epidemiology
  • United States / epidemiology
  • Young Adult