Background: Prevalence studies usually depend on self-report of disease status in survey data or administrative data collections and may over- or under-estimate disease prevalence. The establishment of a linked data collection provided an opportunity to explore the accuracy and completeness of capture of information about diabetes in survey and administrative data collections.
Methods: Baseline questionnaire data at recruitment to the 45 and Up Study was obtained for 266,848 adults aged 45 years and over sampled from New South Wales, Australia in 2006-2009, and linked to administrative data about hospitalisation from the Admitted Patient Data Collection (APDC) for 2000-2009, claims for medical services (MBS) and pharmaceuticals (PBS) from Medicare Australia data for 2004-2009. Diabetes status was determined from response to a question 'Has a doctor EVER told you that you have diabetes' (n = 23,981) and augmented by examination of free text fields about diagnosis (n = 119) or use of insulin (n = 58). These data were used to identify the sub-group with type 1 diabetes. We explored the agreement between self-report of diabetes, identification of diabetes diagnostic codes in APDC data, claims for glycosylated haemoglobin (HbA1c) in MBS data, and claims for dispensed medication (oral hyperglycaemic agents and insulin) in PBS data.
Results: Most participants with diabetes were identified in APDC data if admitted to hospital (79.3%), in MBS data with at least one claim for HbA1c testing (84.7%; 73.4% if 2 tests claimed) or in PBS data through claim for diabetes medication (71.4%). Using these alternate data collections as an imperfect 'gold standard' we calculated sensitivities of 83.7% for APDC, 63.9% (80.5% for two tests) for MBS, and 96.6% for PBS data and specificities of 97.7%, 98.4% and 97.1% respectively. The lower sensitivity for HbA1c may reflect the use of this test to screen for diabetes suggesting that it is less useful in identifying people with diabetes without additional information. Kappa values were 0.80, 0.70 and 0.80 for APDC, MBS and PBS respectively reflecting the large population sample under consideration. Compared to APDC, there was poor agreement about identifying type 1 diabetes status.
Conclusions: Self-report of diagnosis augmented with free text data indicating diabetes as a chronic condition and/or use of insulin among medications used was able to identify participants with diabetes with high sensitivity and specificity compared to available administrative data collections.