Thirteen Questions About Using Machine Learning in Causal Research (You Won't Believe the Answer to Number 10!)

Am J Epidemiol. 2021 Aug 1;190(8):1476-1482. doi: 10.1093/aje/kwab047.


Machine learning is gaining prominence in the health sciences, where much of its use has focused on data-driven prediction. However, machine learning can also be embedded within causal analyses, potentially reducing biases arising from model misspecification. Using a question-and-answer format, we provide an introduction and orientation for epidemiologists interested in using machine learning but concerned about potential bias or loss of rigor due to use of "black box" models. We conclude with sample software code that may lower the barrier to entry to using these techniques.

Keywords: causal inference; double-robustness; epidemiologic methods; inverse probability weighting; machine learning; propensity score; targeted maximum likelihood estimation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Bias
  • Causality*
  • Data Interpretation, Statistical*
  • Epidemiologic Methods*
  • Humans
  • Machine Learning*