Improving measurement of binary covariates in claims data: A simulation study

Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1093-1100. doi: 10.1002/pds.4961. Epub 2020 Jan 23.

Abstract

Purpose: When investigators have two claims-based definitions for a binary confounder, it is unclear whether to prefer the more sensitive or more specific definition. Our objective was to compare adjusting for the sensitive or specific definition alone vs two novel approaches combining both definitions: a "two-algorithm indicator" and a "two-algorithm restriction" approach.

Methods: Each simulated patient had a binary exposure, outcome, and confounder. We created two nested, misclassified versions of the confounder using validated heart failure definitions. The sensitive definition had a sensitivity/specificity of 0.98/0.83, while the specific definition had a sensitivity/specificity of 0.77/0.99. Patients were classified into 3 groups: group 0 did not meet either definition, group 1 met the sensitive but not specific definition, and group 2 met both. The two-algorithm indicator approach adjusted using indicators for groups 1 and 2, while the two-algorithm restriction approach excluded patients in group 1 and adjusted using an indicator for group 2. Adjusted exposure odds ratios (ORs) were estimated for each approach using logistic regression.

Results: The crude OR was 1.33 (95% CI, 1.07-1.63). Adjusting for the specific or sensitive definitions resulted in ORs of 1.09 (95% CI, 0.87-1.35) and 1.14 (95% CI, 0.91-1.40). The two-algorithm indicator method returned an OR of 1.07 (95% CI, 0.86-1.33). The two-algorithm restriction approach returned an OR of 1.02 (95% CI, 0.79-1.29) but excluded 20% of the cohort.

Conclusions: The two-algorithm indicator approach may improve adjustment for claims-based confounders by returning a point estimate at least as unbiased as the better of the two component definitions.

Keywords: administrative claims data; confounder misclassification; confounding; pharmacoepidemiology; simulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Administrative Claims, Healthcare / statistics & numerical data*
  • Algorithms*
  • Cohort Studies
  • Computer Simulation
  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical
  • Drug-Related Side Effects and Adverse Reactions / epidemiology*
  • Drug-Related Side Effects and Adverse Reactions / etiology
  • Humans
  • Odds Ratio
  • Pharmacoepidemiology / methods*
  • Sensitivity and Specificity