The usefulness of administrative databases for identifying disease cohorts is increased with a multivariate model

J Clin Epidemiol. 2010 Dec;63(12):1332-41. doi: 10.1016/j.jclinepi.2010.01.016. Epub 2010 May 8.


Background: Administrative databases commonly use codes to indicate diagnoses. These codes alone are often inadequate to accurately identify patients with particular conditions. In this study, we determined whether we could quantify the probability that a person has a particular disease-in this case renal failure-using other routinely collected information available in an administrative data set. This would allow the accurate identification of a disease cohort in an administrative database.

Methods: We determined whether patients in a randomly selected 100,000 hospitalizations had kidney disease (defined as two or more sequential serum creatinines or the single admission creatinine indicating a calculated glomerular filtration rate less than 60 mL/min/1.73 m²). The independent association of patient- and hospitalization-level variables with renal failure was measured using a multivariate logistic regression model in a random 50% sample of the patients. The model was validated in the remaining patients.

Results: Twenty thousand seven hundred thirteen patients had kidney disease (20.7%). A diagnostic code of kidney disease was strongly associated with kidney disease (relative risk: 34.4), but the accuracy of the code was poor (sensitivity: 37.9%; specificity: 98.9%). Twenty-nine patient- and hospitalization-level variables entered the kidney disease model. This model had excellent discrimination (c-statistic: 90.1%) and accurately predicted the probability of true renal failure. The probability threshold that maximized sensitivity and specificity for the identification of true kidney disease was 21.3% (sensitivity: 80.0%; specificity: 82.2%).

Conclusion: Multiple variables available in administrative databases can be combined to quantify the probability that a person has a particular disease. This process permits accurate identification of a disease cohort in an administrative database. These methods may be extended to other diagnoses or procedures and could both facilitate and clarify the use of administrative databases for research and quality improvement.

MeSH terms

  • Aged
  • Canada / epidemiology
  • Cohort Studies
  • Databases, Factual / statistics & numerical data*
  • Epidemiologic Research Design
  • Female
  • Hospitalization / statistics & numerical data*
  • Humans
  • International Classification of Diseases
  • Kidney Failure, Chronic / classification
  • Kidney Failure, Chronic / diagnosis
  • Kidney Failure, Chronic / epidemiology*
  • Logistic Models
  • Male
  • Medical Records / statistics & numerical data
  • Probability
  • Reproducibility of Results