Design and analysis of controlled trials in naturally clustered environments: implications for medical informatics

J Am Med Inform Assoc. 2002 May-Jun;9(3):230-8. doi: 10.1197/jamia.m0997.


In medical informatics research, study questions frequently involve individuals who are grouped into clusters. For example, an intervention may be aimed at a clinician (who treats a cluster of patients) with the intention of improving the health of individual patients. Correlation among individuals within a cluster can lead to incorrect estimates of the sample size required to detect an effect and inappropriate estimates of the confidence intervals and the statistical significance of the intervention effects. Contamination, which is the spread of the effect of an intervention or control treatment to the opposite group, often occurs between individuals within clusters. It leads to an attenuation of the effect of the intervention and reduced power to detect a difference. If individuals are randomized in a clinical trial (individual-randomized trial), then correlation must be taken into account in the analysis, and the sample size may need to be increased to compensate for contamination. Randomizing clusters rather than individuals (cluster-randomized trials) can eliminate contamination and may be preferred for logistical reasons. Cluster-randomized trials are generally less efficient than individual-randomized trials, so the tradeoffs must be assessed. Correlation must be taken into account in the analysis and in the sample-size calculations for cluster-randomized trials.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Cluster Analysis
  • Randomized Controlled Trials as Topic / methods*