A comparison of two methods of estimating propensity scores after multiple imputation

Stat Methods Med Res. 2016 Feb;25(1):188-204. doi: 10.1177/0962280212445945. Epub 2012 Jun 11.


In many observational studies, analysts estimate treatment effects using propensity scores, e.g. by matching or sub-classifying on the scores. When some values of the covariates are missing, analysts can use multiple imputation to fill in the missing data, estimate propensity scores based on the m completed datasets, and use the propensity scores to estimate treatment effects. We compare two approaches to implement this process. In the first, the analyst estimates the treatment effect using propensity score matching within each completed data set, and averages the m treatment effect estimates. In the second approach, the analyst averages the m propensity scores for each record across the completed datasets, and performs propensity score matching with these averaged scores to estimate the treatment effect. We compare properties of both methods via simulation studies using artificial and real data. The simulations suggest that the second method has greater potential to produce substantial bias reductions than the first, particularly when the missing values are predictive of treatment assignment.

Keywords: Missing data; multiple imputation; observational studies; propensity score.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bias
  • Biostatistics
  • Breast Feeding / statistics & numerical data
  • Child
  • Child Development
  • Child, Preschool
  • Computer Simulation
  • Humans
  • Infant
  • Infant, Newborn
  • Models, Statistical*
  • Observational Studies as Topic / statistics & numerical data
  • Propensity Score*
  • Treatment Outcome