Background: The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses.
Methods: In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI) and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE), and coverage probability.
Results: GEE performs well on all four measures--provided the downward bias of the standard error (when the number of clusters per arm is small) is adjusted appropriately--under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF) <3; within-cluster MI for CRTs with VIF≥3 and cluster size>50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied.
Conclusion: GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.