Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators

Stat Methods Med Res. 2019 Jun;28(6):1637-1650. doi: 10.1177/0962280218772065. Epub 2018 May 2.


Data-adaptive methods have been proposed to estimate nuisance parameters when using doubly robust semiparametric methods for estimating marginal causal effects. However, in the presence of near practical positivity violations, these methods can produce a separation of the two exposure groups in terms of propensity score densities which can lead to biased estimates of the treatment effect. To motivate the problem, we evaluated the Targeted Minimum Loss-based Estimation procedure using a simulation scenario to estimate the average treatment effect. We highlight the divergence in estimates obtained when using parametric and data-adaptive methods to estimate the propensity score. We then adapted an existing diagnostic tool based on a bootstrap resampling of the subjects and simulation of the outcome data in order to show that the estimation using data-adaptive methods for the propensity score in this study may lead to large bias and poor coverage. The adapted bootstrap procedure is able to identify this instability and can be used as a diagnostic tool.

Keywords: Causal inference; IPTW; TMLE; doubly robust; positivity; super learner.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias*
  • Causality
  • Humans
  • Machine Learning*
  • Models, Statistical
  • Probability
  • Propensity Score

Grant support