Algorithms for the Capture and Adjudication of Prevalent and Incident Diabetes in UK Biobank

PLoS One. 2016 Sep 15;11(9):e0162388. doi: 10.1371/journal.pone.0162388. eCollection 2016.


Objectives: UK Biobank is a UK-wide cohort of 502,655 people aged 40-69, recruited from National Health Service registrants between 2006-10, with healthcare data linkage. Type 2 diabetes is a key exposure and outcome. We developed algorithms to define prevalent and incident diabetes for UK Biobank. The algorithms will be implemented by UK Biobank and their results made available to researchers on request.

Methods: We used UK Biobank self-reported medical history and medication to assign prevalent diabetes and type, and tested this against linked primary and secondary care data in Welsh UK Biobank participants. Additionally, we derived and tested algorithms for incident diabetes using linked primary and secondary care data in the English Clinical Practice Research Datalink, and ran these on secondary care data in UK Biobank.

Results and significance: For prevalent diabetes, 0.001% and 0.002% of people classified as "diabetes unlikely" in UK Biobank had evidence of diabetes in their primary or secondary care record respectively. Of those classified as "probable" type 2 diabetes, 75% and 96% had specific type 2 diabetes codes in their primary and secondary care records. For incidence, 95% of people with the type 2 diabetes-specific C10F Read code in primary care had corroborative evidence of diabetes from medications, blood testing or diabetes specific process of care codes. Only 41% of people identified with type 2 diabetes in primary care had secondary care evidence of type 2 diabetes. In contrast, of incident cases using ICD-10 type 2 diabetes specific codes in secondary care, 77% had corroborative evidence of diabetes in primary care. We suggest our definition of prevalent diabetes from UK Biobank baseline data has external validity, and recommend that specific primary care Read codes should be used for incident diabetes to ensure precision. Secondary care data should be used for incident diabetes with caution, as around half of all cases are missed, and a quarter have no corroborative evidence of diabetes in primary care.

MeSH terms

  • Aged
  • Algorithms*
  • Biological Specimen Banks*
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Female
  • Humans
  • Incidence
  • Male
  • Middle Aged
  • Prevalence
  • United Kingdom / epidemiology