Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples

Am J Epidemiol. 2011 Jun 15;173(12):1404-13. doi: 10.1093/aje/kwr001. Epub 2011 May 20.


To reduce bias by residual confounding in nonrandomized database studies, the high-dimensional propensity score (hd-PS) algorithm selects and adjusts for previously unmeasured confounders. The authors evaluated whether hd-PS maintains its capabilities in small cohorts that have few exposed patients or few outcome events. In 4 North American pharmacoepidemiologic cohort studies between 1995 and 2005, the authors repeatedly sampled the data to yield increasingly smaller cohorts. They identified potential confounders in each sample and estimated both an hd-PS that included 0-500 covariates and treatment effects adjusted by decile of hd-PS. For sensitivity analyses, they altered the variable selection process to use zero-cell correction and, separately, to use only the variables' exposure association. With >50 exposed patients with an outcome event, hd-PS-adjusted point estimates in the small cohorts were similar to the full-cohort values. With 25-50 exposed events, both sensitivity analyses yielded estimates closer to those obtained in the full data set. Point estimates generally did not change as compared with the full data set when selecting >300 covariates for the hd-PS. In these data, using zero-cell correction or exposure-based covariate selection allowed hd-PS to function robustly with few events. hd-PS is a flexible analytical tool for nonrandomized research across a range of study sizes and event frequencies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms
  • Cohort Studies
  • Confounding Factors, Epidemiologic*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Outcome Assessment, Health Care
  • Pharmacoepidemiology*
  • Predictive Value of Tests
  • Propensity Score*
  • Sample Size