A comparison of statistical methods for clustered data analysis with Gaussian error

Stat Med. 1996 Aug 30;15(16):1793-806. doi: 10.1002/(SICI)1097-0258(19960830)15:16<1793::AID-SIM332>3.0.CO;2-2.


We investigate by simulation the properties of four different estimation procedures under a linear model for correlated data with Gaussian error: maximum likelihood based on the normal mixed linear model; generalized estimating equations; a four-stage method, and a bootstrap method that resamples clusters rather than individuals. We pay special attention to the group randomized trials where the number of independent clusters is small, cluster sizes are big, and the correlation within the cluster is weak. We show that for balanced and near balanced data when the number of independent clusters is small (< or = 10), the bootstrap is superior if analysts do not want to impose strong distribution and covariance structure assumptions. Otherwise, ML and four-stage methods are slightly better. All four methods perform well when the number of independent clusters reaches 50.

Publication types

  • Comparative Study

MeSH terms

  • Bias*
  • Cluster Analysis*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Least-Squares Analysis
  • Likelihood Functions
  • Linear Models
  • Normal Distribution*
  • Sample Size
  • Statistics, Nonparametric