Estimating the burden of disease. Comparing administrative data and self-reports

Med Care. 1997 Sep;35(9):932-47. doi: 10.1097/00005650-199709000-00006.


Objectives: A cardiovascular health survey of a representative sample of the adult population of Manitoba, Canada was combined with the provincial health insurance claims database to determine the accuracy of survey questions in detecting cases of diabetes, hypertension, ischemic heart disease, stroke, and hypercholesterolemia.

Methods: Of 2,792 subjects in the survey, 97.7% were linked successfully using a scrambled personal health insurance number. Hospital and physician claims were extracted for these individuals for the 3-year period before the survey.

Results: The authors found no benefits to using restrictive criteria for entrance into the study (ie, requiring more than one diagnosis to define a case). Using additional years of data increased agreement between data sources. Kappa values indicated high levels of agreement between administrative data and self-reports for diabetes (0.72) and hypertension (0.59); kappa values were approximately 0.4 for the other conditions. Using administrative data as the "gold standard," specificity was generally very high, although cases with hypertension and hypercholesterolemia (diagnosed primarily by laboratory or physical measurement) were associated with a lower specificity than the other conditions. Sensitivity varied markedly and was lowest for "other heart disease" and "stroke". For diabetes and hypertension, inclusion criteria calling for more than one diagnosis reduced the accuracy of case identification, whereas increasing the number of years of data increased accuracy of identification. For diabetes and hypertension, self-reports were fairly accurate in detecting "true" past history of the illness based on physician diagnosis recorded on insurance claims.

Conclusions: This study demonstrates the feasibility of linking a large health survey with administrative data and the validity of self-reports in estimating the prevalence of chronic diseases, especially diabetes and hypertension. A linked data set offers unusual opportunities for epidemiologic and health services research in a defined population.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Cerebrovascular Disorders / epidemiology*
  • Cost of Illness*
  • Diabetes Mellitus / epidemiology*
  • Diagnosis-Related Groups
  • Feasibility Studies
  • Female
  • Health Surveys*
  • Humans
  • Hypercholesterolemia / epidemiology*
  • Hypertension / epidemiology*
  • Insurance Claim Reporting / standards*
  • Male
  • Manitoba / epidemiology
  • Medical Record Linkage*
  • Middle Aged
  • Myocardial Ischemia / epidemiology*
  • Population Surveillance / methods
  • Prevalence
  • Reproducibility of Results
  • Sensitivity and Specificity