Power determination for geographically clustered data using generalized estimating equations

Stat Med. 1996 Sep 15-30;15(17-18):1951-60. doi: 10.1002/(sici)1097-0258(19960930)15:18<1951::aid-sim407>3.0.co;2-p.


Study designs in public health research often require the estimation of intervention effects that have been applied to a cluster of subjects in a common geographic area, rather than randomly assigned to individual subjects, and where the outcome is dichotomous. Statistical methods that account for the intracluster correlation of measurements must be used or the standard errors of regression coefficients will be under-estimated. Generalized estimating equations (GEE) can be used to account for this correlation, although there are no straightforward methods to determine sample-size requirements for adequate power. A simulation study was performed to calculate power in a GEE model for a proposed study of the effect of an intervention, designed to reduce lower-back injuries among nursing personnel employed in nursing homes. Nursing homes will be randomly assigned to either an intervention or control group and all employees within a nursing home will be treated alike. Historical injury data indicates that the baseline-injury risk for each home can be reasonably modelled using a beta distribution. It is assumed that the risk for any individual nurse within a nursing home follows a Bernoulli probability distribution expressed as a logit function of fixed covariates, which have values of odds ratios determined from previous studies which represent characteristics of the study population, and a random-intercept term which is specific for each home. Results indicate that failure to account for intracluster correlation can lead to overestimates of power as well as inflation of type I error by as much as 20 per cent. Although the GEE method accounted for the intracluster correlation when present, estimates of the intracluster correlation were negatively biased when no intracluster correlation was present. In addition, and possibly related to the negatively biased estimates of intracluster correlation, we also found inflated type I error estimates from the GEE method.

MeSH terms

  • Algorithms
  • Bias
  • Computer Simulation
  • Health Services Research / methods
  • Health Services Research / statistics & numerical data*
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Low Back Pain / prevention & control
  • Models, Statistical*
  • Nursing Staff
  • Occupational Diseases / prevention & control
  • Sample Size*
  • Small-Area Analysis*