Comparing type 1 and type 2 error rates of different tests for heterogeneous treatment effects

Steffen Nestler; Marie Salditt

doi:10.3758/s13428-024-02371-x

Comparing type 1 and type 2 error rates of different tests for heterogeneous treatment effects

Behav Res Methods. 2024 Mar 20. doi: 10.3758/s13428-024-02371-x. Online ahead of print.

Authors

Steffen Nestler¹, Marie Salditt²

Affiliations

¹ University of Münster, Institut für Psychologie, Fliednerstr. 21, 48149, Münster, Germany. steffen.nestler@uni-muenster.de.
² University of Münster, Institut für Psychologie, Fliednerstr. 21, 48149, Münster, Germany.

PMID: 38509268
DOI: 10.3758/s13428-024-02371-x

Abstract

Psychologists are increasingly interested in whether treatment effects vary in randomized controlled trials. A number of tests have been proposed in the causal inference literature to test for such heterogeneity, which differ in the sample statistic they use (either using the variance terms of the experimental and control group, their empirical distribution functions, or specific quantiles), and in whether they make distributional assumptions or are based on a Fisher randomization procedure. In this manuscript, we present the results of a simulation study in which we examine the performance of the different tests while varying the amount of treatment effect heterogeneity, the type of underlying distribution, the sample size, and whether an additional covariate is considered. Altogether, our results suggest that researchers should use a randomization test to optimally control for type 1 errors. Furthermore, all tests studied are associated with low power in case of small and moderate samples even when the heterogeneity of the treatment effect is substantial. This suggests that current tests for treatment effect heterogeneity require much larger samples than those collected in current research.

Keywords: Causality; Heterogeneous regression; Heterogeneous treatment effects; Randomization tests.