Analysis of partially observed clustered data using generalized estimating equations and multiple imputation

Stata J. 2014 Oct 1;14(4):863-883.

Abstract

Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.

Keywords: ALSPAC study; eating disorders; generalized estimating equations; missing at random; missing completely at random; missing data; multiple imputation; multiple informants; weighted estimating equations.