Using generalized additive models to reduce residual confounding

Stat Med. 2004 Dec 30;23(24):3781-801. doi: 10.1002/sim.2073.


Traditionally, confounding by continuous variables is controlled by including a linear or categorical term in a regression model. Residual confounding occurs when the effect of the confounder on the outcome is mis-modelled. A continuous representation of a covariate was previously shown to result in a less biased estimate of the adjusted exposure effect than categorization provided the functional form of the covariate-outcome relationship is correctly specified. However, this is rarely known. In contrast to parametric regression, generalized additive models (GAM) fit a smooth dose-response curve to the data, without requiring a priori knowledge of the functional form. We used simulations to compare parametric multiple logistic regression vs its non-parametric GAM extension in their ability to control for a continuous confounder. We also investigated several issues related to the implementation of GAM in this context, including: (i) selecting the degrees of freedom; and (ii) alternative criteria for inclusion/exclusion of the potential confounder and for choosing between parametric and non-parametric representation of its effect. The impact of the shape and strength of the confounder-disease association, sample size, and the correlation between the confounder and exposure were investigated. Simulations showed that when the confounder has a non-linear association with the outcome, compared to a parametric representation, GAM modelling (i) reduced the mean squared error for the adjusted exposure effect; (ii) avoided inflation of the type I error for testing the exposure effect. When the true confounder-outcome relationship was linear, GAM performed as well as the parametric logistic regression. When modelling a continuous exposure non-parametrically, in the presence of a continuous confounder, our results suggest that assuming a linear effect of the confounder and focussing on the non-linearity of the exposure-outcome relationship leads to spurious findings of non-linearity: joint non-linear modelling is necessary. Overall, our results suggest that the use of GAM to reduce residual confounding offers several improvements over conventional parametric modelling.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Confounding Factors, Epidemiologic*
  • Data Interpretation, Statistical*
  • Humans
  • Logistic Models
  • Models, Statistical*
  • Multivariate Analysis