Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size

Soufiane Ajana; Niyazi Acar; Lionel Bretillon; Boris P Hejblum; Hélène Jacqmin-Gadda; Cécile Delcourt; BLISAR Study Group

doi:10.1093/bioinformatics/btz135

Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size

Bioinformatics. 2019 Oct 1;35(19):3628-3634. doi: 10.1093/bioinformatics/btz135.

Authors

Soufiane Ajana¹, Niyazi Acar², Lionel Bretillon², Boris P Hejblum^{3

4}, Hélène Jacqmin-Gadda⁵, Cécile Delcourt¹; BLISAR Study Group

Affiliations

¹ Inserm, Bordeaux Population Health Research Center, Team LEHA, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France.
² Centre des Sciences du Goût et de l'Alimentation, AgroSup Dijon, CNRS, INRA, Université Bourgogne Franche-Comté, Dijon, France.
³ ISPED, Inserm, Bordeaux Population Health Research Center 1219, Inria SISTM, University of Bordeaux, F-33000 Bordeaux, France.
⁴ Vaccine Research Institute (VRI), Hôpital Henri Mondor, Créteil, France.
⁵ Inserm, Bordeaux Population Health Research Center, Team Biostatistics, UMR 1219, University of Bordeaux, F-33000 Bordeaux, France.

PMID: 30931473
DOI: 10.1093/bioinformatics/btz135

Abstract

Motivation: In some prediction analyses, predictors have a natural grouping structure and selecting predictors accounting for this additional information could be more effective for predicting the outcome accurately. Moreover, in a high dimension low sample size framework, obtaining a good predictive model becomes very challenging. The objective of this work was to investigate the benefits of dimension reduction in penalized regression methods, in terms of prediction performance and variable selection consistency, in high dimension low sample size data. Using two real datasets, we compared the performances of lasso, elastic net, group lasso, sparse group lasso, sparse partial least squares (PLS), group PLS and sparse group PLS.

Results: Considering dimension reduction in penalized regression methods improved the prediction accuracy. The sparse group PLS reached the lowest prediction error while consistently selecting a few predictors from a single group.

Availability and implementation: R codes for the prediction methods are freely available at https://github.com/SoufianeAjana/Blisar.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Least-Squares Analysis
Sample Size*