Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 72 (1), 3-25

Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection

Affiliations

Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection

Hyonho Chun et al. J R Stat Soc Series B Stat Methodol.

Abstract

Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data.

Figures

Fig. 2
Fig. 2
Estimated TF activities selected only by the multivariate SPLS regression; the magnitudes of the estimated TF activities are small but consistent across the time points
Fig. 1
Fig. 1
Estimated TF activities for the 21 confirmed TFs (plots for ABF-1, CBF-1, GCR2 and SKN7 are not displayed since the TF activities of the factors were zero by both the univariate and the multivariate SPLS; the y-axis denotes estimated coefficients and the x-axis is time; multivariate SPLS regression yields smoother estimates and exhibits periodicity): formula image, estimated TF activities by the multivariate SPLS regression; formula image, estimated TF activities by univariate SPLS

Similar articles

See all similar articles

Cited by 109 PubMed Central articles

See all "Cited by" articles

References

    1. Abramovich F, Benjamini Y, Donoho DL, Johnstone IM. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 2006;34:584–653.
    1. d'Aspremont A, Ghaoui LE, Jordan MI, Lanckriet GRG. A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 2007;49:434–448.
    1. Bair E, Hastie T, Paul D, Tibshirani R. Prediction by supervised principal components. J. Am. Statist. Ass. 2006;101:119–137.
    1. Bendel RB, Afifi AA. A criterion for stepwise regression. Am. Statistn. 1976;30:85–87.
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B. 1995;57:289–300.

LinkOut - more resources

Feedback