Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 52 (10), 935-42

Stepwise Selection in Small Data Sets: A Simulation Study of Bias in Logistic Regression Analysis

Affiliations

Stepwise Selection in Small Data Sets: A Simulation Study of Bias in Logistic Regression Analysis

E W Steyerberg et al. J Clin Epidemiol.

Abstract

Stepwise selection methods are widely applied to identify covariables for inclusion in regression models. One of the problems of stepwise selection is biased estimation of the regression coefficients. We illustrate this "selection bias" with logistic regression in the GUSTO-I trial (40,830 patients with an acute myocardial infarction). Random samples were drawn that included 3, 5, 10, 20, or 40 events per variable (EPV). Backward stepwise selection was applied in models containing 8 or 16 pre-specified predictors of 30-day mortality. We found a considerable overestimation of regression coefficients of selected covariables. The selection bias decreased with increasing EPV. For EPV 3, 10, or 40, the bias exceeded 25% for 7, 3, and 1 in the 8-predictor model respectively, when a conventional selection criterion was used (alpha = 0.05). For these EPV values, the bias was less than 20% for all covariables when no selection was applied. We conclude that stepwise selection may result in a substantial bias of estimated regression coefficients.

Similar articles

See all similar articles

Cited by 66 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback