Are missing data adequately handled in cluster randomised trials? A systematic review and guidelines

Clin Trials. 2014 Oct;11(5):590-600. doi: 10.1177/1740774514537136. Epub 2014 Jun 5.


Background: Missing data are a potential source of bias, and their handling in the statistical analysis can have an important impact on both the likelihood and degree of such bias. Inadequate handling of the missing data may also result in invalid variance estimation. The handling of missing values is more complex in cluster randomised trials, but there are no reviews of practice in this field.

Objectives: A systematic review of published trials was conducted to examine how missing data are reported and handled in cluster randomised trials.

Methods: We systematically identified cluster randomised trials, published in English in 2011, using the National Library of Medicine (MEDLINE) via PubMed. Non-randomised and pilot/feasibility trials were excluded, as were reports of secondary analyses, interim analyses, and economic evaluations and those where no data were at the individual level. We extracted information on missing data and the statistical methods used to deal with them from a random sample of the identified studies.

Results: We included 132 trials. There was evidence of missing data in 95 (72%). Only 32 trials reported handling missing data, 22 of them using a variety of single imputation techniques, 8 using multiple imputation without accommodating the clustering and 2 stating that their likelihood-based complete case analysis accounted for missing values because the data were assumed Missing-at-Random.

Limitations: The results presented in this study are based on a large random sample of cluster randomised trials published in 2011, identified in electronic searches and therefore possibly missing some trials, most likely of poorer quality. Also, our results are based on information in the main publication for each trial. These reports may omit some important information on the presence of, and reasons for, missing data and on the statistical methods used to handle them. Our extraction methods, based on published reports, could not distinguish between missing data in outcomes and missing data in covariates. This distinction may be important in determining the assumptions about the missing data mechanism necessary for complete case analyses to be valid.

Conclusions: Missing data are present in the majority of cluster randomised trials. However, they are poorly reported, and most authors give little consideration to the assumptions under which their analysis will be valid. The majority of the methods currently used are valid under very strong assumptions about the missing data, whose plausibility is rarely discussed in the corresponding reports. This may have important consequences for the validity of inferences in some trials. Methods which result in valid inferences under general Missing-at-Random assumptions are available and should be made more accessible.

Keywords: Cluster randomised trials; missing data; multiple imputation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review
  • Systematic Review

MeSH terms

  • Data Interpretation, Statistical
  • Guidelines as Topic*
  • Humans
  • Randomized Controlled Trials as Topic*
  • Research Design
  • Statistics as Topic*