Analysis of matched case-control data with incomplete strata: applying longitudinal approaches

Epidemiology. 2007 Jul;18(4):446-52. doi: 10.1097/EDE.0b013e318064630a.


Background: Matched case-control data have a structure that is similar to longitudinal data with correlated outcomes, except for a retrospective sampling scheme. In conditional logistic regression analysis, sets that are incomplete due to missing covariates and sets with identical values of the covariates do not contribute to the estimation; both situations may cause a loss in efficiency. These problems are more severe when sample sizes are small. We evaluated retrospective models for longitudinal data as alternatives in analyzing matched case-control data.

Methods: We conducted simulations to compare the properties of matched case-control data analyses using conditional likelihood and a commonly used longitudinal approach generalized estimating equation (GEE). We simulated scenarios for one-to-one and one-to-two matching designs, each with various sizes of matching strata, with complete and incomplete strata, and with dichotomous and normal exposures.

Results and conclusions: The simulations show that the estimates by conditional likelihood and GEE methods are consistent, and a proper coverage was reached for both binary and continuous exposures. The estimates produced by conditional likelihood have greater standard errors than those obtained by GEE. These relative efficiency losses are more substantial when data contain incomplete matched sets and when the data have small sizes of matching strata; these can be improved by including more controls in the strata. These losses of efficiency also increase as the magnitude of the association increases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies*
  • Computer Simulation*
  • Data Interpretation, Statistical
  • Humans
  • Logistic Models*
  • Longitudinal Studies*
  • Matched-Pair Analysis
  • Prospective Studies
  • Retrospective Studies