Simulation study of confounder-selection strategies

Am J Epidemiol. 1993 Dec 1;138(11):923-36. doi: 10.1093/oxfordjournals.aje.a116813.


In the absence of prior knowledge about population relations, investigators frequently employ a strategy that uses the data to help them decide whether to adjust for a variable. The authors compared the performance of several such strategies for fitting multiplicative Poisson regression models to cohort data: 1) the "change-in-estimate" strategy, in which a variable is controlled if the adjusted and unadjusted estimates differ by some important amount; 2) the "significance-test-of-the-covariate" strategy, in which a variable is controlled if its coefficient is significantly different from zero at some predetermined significance level; 3) the "significance-test-of-the-difference" strategy, which tests the difference between the adjusted and unadjusted exposure coefficients; 4) the "equivalence-test-of-the-difference" strategy, which significance-tests the equivalence of the adjusted and unadjusted exposure coefficients; and 5) a hybrid strategy that takes a weighted average of adjusted and unadjusted estimates. Data were generated from 8,100 population structures at each of several sample sizes. The performance of the different strategies was evaluated by computing bias, mean squared error, and coverage rates of confidence intervals. At least one variation of each strategy that was examined performed acceptably. The change-in-estimate and equivalence-test-of-the-difference strategies performed best when the cut-point for deciding whether crude and adjusted estimates differed by an important amount was set to a low value (10%). The significance test strategies performed best when the alpha level was set to much higher than conventional levels (0.20).

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bias
  • Cohort Studies*
  • Confidence Intervals
  • Confounding Factors, Epidemiologic*
  • Epidemiologic Methods*
  • Humans
  • Logistic Models
  • Odds Ratio
  • Regression Analysis*
  • Reproducibility of Results
  • Research Design