J Am Stat Assoc. 2016;111(515):942-947. doi: 10.1080/01621459.2016.1200914. Epub 2016 Oct 18.


Xu, Müller, Wahed, and Thall proposed a Bayesian model to analyze an acute leukemia study involving multi-stage chemotherapy regimes. We discuss two alternative methods, Q-learning and O-learning, to solve the same problem from the machine learning point of view. The numerical studies show that these methods can be flexible and have advantages in some situations to handle treatment heterogeneity while being robust to model misspecification.

Keywords: Dynamic treatment regimes; Multi-stage chemotherapy regimes; O-learning; Q-learning.