Independently Interpretable Lasso for Generalized Linear Models

Masaaki Takada; Taiji Suzuki; Hironori Fujisawa

doi:10.1162/neco_a_01279

Independently Interpretable Lasso for Generalized Linear Models

Neural Comput. 2020 Jun;32(6):1168-1221. doi: 10.1162/neco_a_01279. Epub 2020 Apr 28.

Authors

Masaaki Takada¹, Taiji Suzuki², Hironori Fujisawa³

Affiliations

¹ The Graduate University for Advanced Studies, SOKENDAI, Tokyo 190-8562, Japan, and Toshiba Corporation, Tokyo 105-0023, Japan tkdmah@gmail.com.
² The University of Tokyo, Tokyo 105-0033, Japan; PRESTO, Japan Science and Technology Agency, Saitama 332-0012, Japan; and Center for Advanced Integrated Intelligence Research, RIKEN, Tokyo 103-0027, Japan taiji@mist.i.u-tokyo.ac.jp.
³ The Institute of Statistical Mathematics, Tokyo 190-8562, Japan; and The Graduate University for Advanced Studies, SOKENDAI, Tokyo 190-8562, Japan; and Center for Advanced Integrated Intelligence Research, RIKEN, Tokyo 103-0027, Japan fujisawa@ism.ac.jp.

PMID: 32343648
DOI: 10.1162/neco_a_01279

Abstract

Sparse regularization such as $ℓ_{1}$ regularization is a quite powerful and widely used strategy for high-dimensional learning problems. The effectiveness of sparse regularization has been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary $ℓ_{1}$ regularization selects variables correlated with each other under weak regularizations, which results in deterioration of not only its estimation error but also interpretability. In this letter, we propose a new regularization method, independently interpretable lasso (IILasso), for generalized linear models. Our proposed regularizer suppresses selecting correlated variables, so that each active variable affects the response independently in the model. Hence, we can interpret regression coefficients intuitively, and the performance is also improved by avoiding overfitting. We analyze the theoretical property of the IILasso and show that the proposed method is advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of the IILasso.

Publication types

Research Support, Non-U.S. Gov't