Variable selection with stepwise and best subset approaches

Ann Transl Med. 2016 Apr;4(7):136. doi: 10.21037/atm.2016.03.35.


While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values "forward", "backward" and "both". The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion.

Keywords: Bayesian information criterion; Logistic regression; R; best subset; interaction; stepwise.