High-dimensional propensity score adjustment in studies of treatment effects using health care claims data

Epidemiology. 2009 Jul;20(4):512-22. doi: 10.1097/EDE.0b013e3181a663cc.


Background: Adjusting for large numbers of covariates ascertained from patients' health care claims data may improve control of confounding, as these variables may collectively be proxies for unobserved factors. Here, we develop and test an algorithm that empirically identifies candidate covariates, prioritizes covariates, and integrates them into a propensity-score-based confounder adjustment model.

Methods: We developed a multistep algorithm to implement high-dimensional proxy adjustment in claims data. Steps include (1) identifying data dimensions, eg, diagnoses, procedures, and medications; (2) empirically identifying candidate covariates; (3) assessing recurrence of codes; (4) prioritizing covariates; (5) selecting covariates for adjustment; (6) estimating the exposure propensity score; and (7) estimating an outcome model. This algorithm was tested in Medicare claims data, including a study on the effect of Cox-2 inhibitors on reduced gastric toxicity compared with nonselective nonsteroidal anti-inflammatory drugs (NSAIDs).

Results: In a population of 49,653 new users of Cox-2 inhibitors or nonselective NSAIDs, a crude relative risk (RR) for upper GI toxicity (RR = 1.09 [95% confidence interval = 0.91-1.30]) was initially observed. Adjusting for 15 predefined covariates resulted in a possible gastroprotective effect (0.94 [0.78-1.12]). A gastroprotective effect became stronger when adjusting for an additional 500 algorithm-derived covariates (0.88 [0.73-1.06]). Results of a study on the effect of statin on reduced mortality were similar. Using the algorithm adjustment confirmed a null finding between influenza vaccination and hip fracture (1.02 [0.85-1.21]).

Conclusions: In typical pharmacoepidemiologic studies, the proposed high-dimensional propensity score resulted in improved effect estimates compared with adjustment limited to predefined covariates, when benchmarked against results expected from randomized trials.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Anti-Inflammatory Agents, Non-Steroidal / adverse effects
  • Anti-Inflammatory Agents, Non-Steroidal / therapeutic use
  • Confounding Factors, Epidemiologic*
  • Cyclooxygenase 2 Inhibitors / adverse effects
  • Cyclooxygenase 2 Inhibitors / therapeutic use
  • Female
  • Humans
  • Insurance Claim Review / statistics & numerical data*
  • Male
  • Medicare / statistics & numerical data
  • Pharmacoepidemiology / statistics & numerical data
  • Risk Assessment
  • Treatment Outcome
  • United States
  • Upper Gastrointestinal Tract / drug effects


  • Anti-Inflammatory Agents, Non-Steroidal
  • Cyclooxygenase 2 Inhibitors