Sparse estimation of gene-gene interactions in prediction models

Stat Methods Med Res. 2017 Oct;26(5):2319-2332. doi: 10.1177/0962280215597261. Epub 2015 Aug 11.


Current assessment of gene-gene interactions is typically based on separate parallel analysis, where each interaction term is tested separately, while less attention has been paid on simultaneous estimation of interaction terms in a prediction model. As the number of interaction terms grows fast, sparse estimation is desirable from statistical and interpretability reasons. There is a large literature on sparse estimation, but there is a natural hierarchy between the interaction and its corresponding main effects that requires special considerations. We describe random-effect models that impose sparse estimation of interactions under both strong and weak-hierarchy constraints. We develop an estimation procedure based on the hierarchical-likelihood argument and show that the modelling approach is equivalent to a penalty-based method, with the advantage of the models being more transparent and flexible. We compare the procedure with some standard methods in a simulation study and illustrate its application in an analysis of gene-gene interaction model to predict body-mass index.

Keywords: Group variable selection; hierarchical-likelihood; random-effect model; structured variable selection.

MeSH terms

  • Algorithms
  • Body Mass Index
  • Epistasis, Genetic*
  • Genetic Predisposition to Disease / genetics
  • Humans
  • Likelihood Functions
  • Models, Statistical*