Longitudinal studies are increasingly popular in epidemiology. In this tutorial we provide a detailed review of methods used by us in the analysis of a longitudinal (multiwave or panel) study of adolescent health, focusing on smoking behaviour. This example is explored in detail with the principal aim of providing an introduction to the analysis of longitudinal binary data, at a level suited to statisticians familiar with logistic regression and survival analysis but not necessarily experienced in longitudinal analysis or estimating equation methods. We describe recent advances in statistical methodology that can play a practical role in applications and are available with standard software. Our approach emphasizes the importance of stating clear research questions, and for binary outcomes we suggest these are best organized around the key epidemiological concepts of prevalence and incidence. For prevalence questions, we show how unbiased estimating equations and information-sandwich variance estimates may be used to produce a valid and robust analysis, as long as sample size is reasonably large. We also show how the estimating equation approach readily extends to accommodate adjustments for missing data and complex survey design. A detailed discussion of gender-related differences over time in our smoking outcome is used to emphasize the need for great care in separating longitudinal from cross-sectional information. We show how incidence questions may be addressed using a discrete-time version of the proportional hazards regression model. This approach has the advantages of providing estimates of relative risks, being feasible with standard software, and also allowing robust information-sandwich variance estimates.
Copyright 1999 John Wiley & Sons, Ltd.