Quantifying epidemiologic risk factors using non-parametric regression: model selection remains the greatest challenge

Stat Med. 2003 Nov 15;22(21):3369-81. doi: 10.1002/sim.1638.


Logistic regression is widely used to estimate relative risks (odds ratios) from case-control studies, but when the study exposure is continuous, standard parametric models may not accurately characterize the exposure-response curve. Semi-parametric generalized linear models provide a useful extension. In these models, the exposure of interest is modelled flexibly using a regression spline or a smoothing spline, while other variables are modelled using conventional methods. When coupled with a model-selection procedure based on minimizing a cross-validation score, this approach provides a non-parametric, objective, and reproducible method to characterize the exposure-response curve by one or several models with a favourable bias-variance trade-off. We applied this approach to case-control data to estimate the dose-response relationship between alcohol consumption and risk of oral cancer among African Americans. We did not find a uniquely 'best' model, but results using linear, cubic, and smoothing splines were consistent: there does not appear to be a risk-free threshold for alcohol consumption vis-à-vis the development of oral cancer. This finding was not apparent using a standard step-function model. In our analysis, the cross-validation curve had a global minimum and also a local minimum. In general, the phenomenon of multiple local minima makes it more difficult to interpret the results, and may present a computational roadblock to non-parametric generalized additive models of multiple continuous exposures. Nonetheless, the semi-parametric approach appears to be a practical advance.

MeSH terms

  • Alcohol Drinking / adverse effects
  • Alcohol Drinking / epidemiology*
  • Alcohol Drinking / ethnology
  • Black or African American / statistics & numerical data*
  • Case-Control Studies
  • Dose-Response Relationship, Drug
  • Humans
  • Models, Statistical*
  • Mouth Neoplasms / epidemiology*
  • Mouth Neoplasms / ethnology
  • Regression Analysis
  • Risk Assessment / methods*
  • Risk Assessment / statistics & numerical data
  • Risk Factors
  • Statistics, Nonparametric*
  • United States / epidemiology