Background: Randomized controlled trials (RCTs) are considered the gold standard for evidence-based clinical research, but prior work has suggested that there may be poor reporting of sample sizes in the surgical literature. Sample size calculations are essential for planning a study to minimize both type I and type II errors. We hypothesized that sample size calculations may not be performed consistently in surgery studies and, therefore, many studies may be "underpowered." To address this issue, we reviewed RCTs published in the surgical literature to determine how often sample size calculations were reported and to analyze each study's ability to detect varying degrees of differences in outcomes.
Methods: A comprehensive MEDLINE search identified RCTs published in Annals of Surgery, Archives of Surgery, and Surgery between 1999 and 2002. Each study was evaluated by two independent reviewers. Sample size calculations were performed to determine whether they had 80% power to detect differences between treatment groups of 50% (large) and 20% (small), with one-sided test, alpha = 0.05. For the underpowered studies, the degree to which sample size would need to be increased was determined.
Results: One hundred twenty-seven RCT articles were identified; of these, 48 (38%) reported sample size calculations. Eighty-six (68%) studies reported positive treatment effect, whereas 41 (32%) found negative results. Sixty-three (50%) of the studies were appropriately powered to detect a 50% effect change, whereas 24 (19%) had the power to detect a 20% difference. Of the studies that were underpowered, more than half needed to increase sample size by more than 10-fold.
Conclusions: The reporting of sample size calculations was not provided in more than 60% of recently published surgical RCTs. Moreover, only half of studies had sample sizes appropriate to detect large differences between treatment groups.