Hierarchical Bayesian formulations for selecting variables in regression models

Stat Med. 2012 May 20;31(11-12):1221-37. doi: 10.1002/sim.4439. Epub 2012 Jan 25.

Abstract

The objective of finding a parsimonious representation of the observed data by a statistical model that is also capable of accurate prediction is commonplace in all domains of statistical applications. The parsimony of the solutions obtained by variable selection is usually counterbalanced by a limited prediction capacity. On the other hand, methodologies that assure high prediction accuracy usually lead to models that are neither simple nor easily interpretable. Regularization methodologies have proven to be useful in addressing both prediction and variable selection problems. The Bayesian approach to regularization constitutes a particularly attractive alternative as it is suitable for high-dimensional modeling, offers valid standard errors, and enables simultaneous estimation of regression coefficients and complexity parameters via computationally efficient MCMC techniques. Bayesian regularization falls within the versatile framework of Bayesian hierarchical models, which encompasses a variety of other approaches suited for variable selection such as spike and slab models and the MC(3) approach. In this article, we review these Bayesian developments and evaluate their variable selection performance in a simulation study for the classical small p large n setting. The majority of the existing Bayesian methodology for variable selection deals only with classical linear regression. Here, we present two applications in the contexts of binary and survival regression, where the Bayesian approach was applied to select markers prognostically relevant for the development of rheumatoid arthritis and for overall survival in acute myeloid leukemia patients.

MeSH terms

  • Arthritis, Rheumatoid / epidemiology
  • Bayes Theorem*
  • Biomarkers, Tumor / analysis
  • Cohort Studies
  • Computer Simulation / statistics & numerical data
  • Data Interpretation, Statistical
  • Female
  • Gene Expression Profiling
  • Humans
  • Leukemia, Myeloid, Acute / mortality
  • Male
  • Models, Statistical
  • Regression Analysis*
  • Software / statistics & numerical data
  • Survival Analysis

Substances

  • Biomarkers, Tumor