Power for tests of interaction: effect of raising the Type I error rate

Epidemiol Perspect Innov. 2007 Jun 19:4:4. doi: 10.1186/1742-5573-4-4.


Background: Power for assessing interactions during data analysis is often poor in epidemiologic studies. This is because epidemiologic studies are frequently powered primarily to assess main effects only. In light of this, some investigators raise the Type I error rate, thereby increasing power, when testing interactions. However, this is a poor analysis strategy if the study is chronically under-powered (e.g. in a small study) or already adequately powered (e.g. in a very large study). To demonstrate this point, this study quantified the gain in power for testing interactions when the Type I error rate is raised, for a variety of study sizes and types of interaction.

Methods: Power was computed for the Wald test for interaction, the likelihood ratio test for interaction, and the Breslow-Day test for heterogeneity of the odds ratio. Ten types of interaction, ranging from sub-additive through to super-multiplicative, were investigated in the simple scenario of two binary risk factors. Case-control studies of various sizes were investigated (75 cases & 150 controls, 300 cases & 600 controls, and 1200 cases & 2400 controls).

Results: The strategy of raising the Type I error rate from 5% to 20% resulted in a useful power gain (a gain of at least 10%, resulting in power of at least 70%) in only 7 of the 27 interaction type/study size scenarios studied (26%). In the other 20 scenarios, power was either already adequate (n = 8; 30%), or else so low that it was still weak (below 70%) even after raising the Type I error rate to 20% (n = 12; 44%).

Conclusion: Relaxing the Type I error rate did not usefully improve the power for tests of interaction in many of the scenarios studied. In many studies, the small power gains obtained by raising the Type I error will be more than offset by the disadvantage of increased "false positives". I recommend investigators should not routinely raise the Type I error rate when assessing tests of interaction.