Background: p-values are ubiquitous in medical research, but are often misunderstood. In addition to being misused or perhaps even abused at post-statistical analysis stage of making scientific inference and interpretations, p-values can also be a source of confusion at the design stage.
Methods: Application of standard test statistic on observed data may result in a small p-value which in turn may give the impression that a new study that has the same sample size as the observed data, perhaps even smaller, would have adequate power. We used re-sampling method and computed statistical power to illustrate the fallacy of this conclusion. We have also calculated power using analytical formulae.
Results: We analyzed data consisting of two group comparisons with binary as well as continuous outcome variables. For the binary outcome, the event rates for the outcome of interest in the illustrative data were 15/43 (35%) and 22/34 (65%), respectively (p-value=0.0093). Using these data, a bootstrap-based empirical power was estimated to be 75.4%. One random sample with only two-third of the original data had a p-value of 0.0066, but only an empirical power of 57.4%. Similar results were observed for a continuous outcome.
Conclusion: Our results show that the number of zeros after the decimal point in a p-value from an observed sample cannot and should not be used to gauge the adequacy of sample size for a future study that is expected to have sufficient power to detect an effect as big as the observed.