Objectives: A cardiovascular health survey of a representative sample of the adult population of Manitoba, Canada was combined with the provincial health insurance claims database to determine the accuracy of survey questions in detecting cases of diabetes, hypertension, ischemic heart disease, stroke, and hypercholesterolemia.
Methods: Of 2,792 subjects in the survey, 97.7% were linked successfully using a scrambled personal health insurance number. Hospital and physician claims were extracted for these individuals for the 3-year period before the survey.
Results: The authors found no benefits to using restrictive criteria for entrance into the study (ie, requiring more than one diagnosis to define a case). Using additional years of data increased agreement between data sources. Kappa values indicated high levels of agreement between administrative data and self-reports for diabetes (0.72) and hypertension (0.59); kappa values were approximately 0.4 for the other conditions. Using administrative data as the "gold standard," specificity was generally very high, although cases with hypertension and hypercholesterolemia (diagnosed primarily by laboratory or physical measurement) were associated with a lower specificity than the other conditions. Sensitivity varied markedly and was lowest for "other heart disease" and "stroke". For diabetes and hypertension, inclusion criteria calling for more than one diagnosis reduced the accuracy of case identification, whereas increasing the number of years of data increased accuracy of identification. For diabetes and hypertension, self-reports were fairly accurate in detecting "true" past history of the illness based on physician diagnosis recorded on insurance claims.
Conclusions: This study demonstrates the feasibility of linking a large health survey with administrative data and the validity of self-reports in estimating the prevalence of chronic diseases, especially diabetes and hypertension. A linked data set offers unusual opportunities for epidemiologic and health services research in a defined population.