Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 284 (1851)

Statistical Model Specification and Power: Recommendations on the Use of Test-Qualified Pooling in Analysis of Experimental Data


Statistical Model Specification and Power: Recommendations on the Use of Test-Qualified Pooling in Analysis of Experimental Data

Nick Colegrave et al. Proc Biol Sci.


A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure.

Keywords: experimental design; model simplification; pseudoreplication.


Figure 1.
Figure 1.
To illustrate how the type I error rate can be affected by test-qualified pooling we examined simulated datasets for both 4 (dashed line) and 10 (solid line) greenhouses. In both cases, equal numbers of greenhouses were allocated to control or treatment conditions (but condition had no effect on plant growth), and 40 plants were measured in each greenhouse. We also examined the effect of two different alpha levels for the pooling decision (recommended in [3]: open circles = 0.25 and filled circles = 0.75), and several different levels of among-greenhouse variation (σ2). Under many different parameter combinations the actual type I error rate differs from the desired value of 0.05, sometimes substantially. Plant growth rates were calculated as a baseline value (10) plus an individual deviation drawn from N(0,1) plus a greenhouse deviation drawn from N(0,σ) and the same for all plants in a given greenhouse. We analysed each dataset in two ways. First we carried out a nested analysis of variance in which the treatment mean square was tested over the among-greenhouses within-treatment mean square. The same analysis tested for variance among greenhouses by comparing the among-greenhouses mean square to the among-plants error mean square. Second we carried out an analysis in which data from all greenhouses were pooled. The decision as to which p-value to use for our actual hypothesis test for the effect of the treatment was based on the significance of the among-greenhouse test at one of two alpha levels. If this test was significant at the appropriate alpha level we used the p-value from the nested model, otherwise we used the p-value from the second model. This process was repeated 100 000. The proportion of these runs that gave a p-value of less than 0.05 (i.e. a false positive at alpha = 0.05) is an estimate of the type I error rate. The simulations were carried out in R, with the AOV function being used for the analyses.

Similar articles

See all similar articles

Cited by 4 PubMed Central articles

LinkOut - more resources