Advanced statistics: statistical methods for analyzing cluster and cluster-randomized data

Acad Emerg Med. 2002 Apr;9(4):330-41. doi: 10.1111/j.1553-2712.2002.tb01332.x.


Sometimes interventions in randomized clinical trials are not allocated to individual patients, but rather to patients in groups. This is called cluster allocation, or cluster randomization, and is particularly common in health services research. Similarly, in some types of observational studies, patients (or observations) are found in naturally occurring groups, such as neighborhoods. In either situation, observations within a cluster tend to be more alike than observations selected entirely at random. This violates the assumption of independence that is at the heart of common methods of statistical estimation and hypothesis testing. Failure to account for the dependence between individual observations and the cluster to which they belong can have profound implications on the design and analysis of such studies. Their p-values will be too small, confidence intervals too narrow, and sample size estimates too small, sometimes to a dramatic degree. This problem is similar to that caused by the more familiar "unit of analysis error" seen when observations are repeated on the same subjects, but are treated as independent. The purpose of this paper is to provide an introduction to the problem of clustered data in clinical research. It provides guidance and examples of methods for analyzing clustered data and calculating sample sizes when planning studies. The article concludes with some general comments on statistical software for cluster data and principles for planning, analyzing, and presenting such studies.

MeSH terms

  • Cluster Analysis*
  • Data Interpretation, Statistical*
  • Random Allocation
  • Randomized Controlled Trials as Topic / standards
  • Randomized Controlled Trials as Topic / statistics & numerical data*