Stability investigations of multivariable regression models derived from low- and high-dimensional data

J Biopharm Stat. 2011 Nov;21(6):1206-31. doi: 10.1080/10543406.2011.629890.

Abstract

Multivariable regression models can link a potentially large number of variables to various kinds of outcomes, such as continuous, binary, or time-to-event endpoints. Selection of important variables and selection of the functional form for continuous covariates are key parts of building such models but are notoriously difficult due to several reasons. Caused by multicollinearity between predictors and a limited amount of information in the data, (in)stability can be a serious issue of models selected. For applications with a moderate number of variables, resampling-based techniques have been developed for diagnosing and improving multivariable regression models. Deriving models for high-dimensional molecular data has led to the need for adapting these techniques to settings where the number of variables is much larger than the number of observations. Three studies with a time-to-event outcome, of which one has high-dimensional data, are used to illustrate several techniques. Investigations at the covariate level and at the predictor level are seen to provide considerable insight into model stability and performance. While some areas are indicated where resampling techniques for model building still need further refinement, our case studies illustrate that these techniques can already be recommended for wider use.

Publication types

  • Comparative Study

MeSH terms

  • Breast Neoplasms / epidemiology
  • Data Interpretation, Statistical*
  • Databases, Factual / statistics & numerical data*
  • Female
  • Glioma / epidemiology
  • Humans
  • Male
  • Multivariate Analysis*
  • Regression Analysis*