Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study

Clin Trials. 2016 Aug;13(4):445-9. doi: 10.1177/1740774516643498. Epub 2016 Apr 19.


Background/aims: Generalized estimating equations are a common modeling approach used in cluster randomized trials to account for within-cluster correlation. It is well known that the sandwich variance estimator is biased when the number of clusters is small (≤40), resulting in an inflated type I error rate. Various bias correction methods have been proposed in the statistical literature, but how adequately they are utilized in current practice for cluster randomized trials is not clear. The aim of this study is to evaluate the use of generalized estimating equation bias correction methods in recently published cluster randomized trials and demonstrate the necessity of such methods when the number of clusters is small.

Methods: Review of cluster randomized trials published between August 2013 and July 2014 and using generalized estimating equations for their primary analyses. Two independent reviewers collected data from each study using a standardized, pre-piloted data extraction template. A two-arm cluster randomized trial was simulated under various scenarios to show the potential effect of a small number of clusters on type I error rate when estimating the treatment effect. The nominal level was set at 0.05 for the simulation study.

Results: Of the 51 included trials, 28 (54.9%) analyzed 40 or fewer clusters with a minimum of four total clusters. Of these 28 trials, only one trial used a bias correction method for generalized estimating equations. The simulation study showed that with four clusters, the type I error rate ranged between 0.43 and 0.47. Even though type I error rate moved closer to the nominal level as the number of clusters increases, it still ranged between 0.06 and 0.07 with 40 clusters.

Conclusions: Our results showed that statistical issues arising from small number of clusters in generalized estimating equations is currently inadequately handled in cluster randomized trials. Potential for type I error inflation could be very high when the sandwich estimator is used without bias correction.

Keywords: Cluster randomized trials; bias correction; generalized estimating equations; sandwich estimator; small number of clusters.

Publication types

  • Review

MeSH terms

  • Cluster Analysis*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Randomized Controlled Trials as Topic*
  • Research Design
  • Sample Size*
  • Selection Bias