Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data
- PMID: 29270229
- PMCID: PMC5735537
- DOI: 10.1186/s13040-017-0159-z
Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data
Abstract
Background: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L1, SCAD and MC+. However, none of the existing algorithms optimizes L0, which penalizes the number of nonzero features directly.
Results: In this paper, we develop a novel sparse generalized linear model (GLM) with L0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L0 optimization directly. Even though the original L0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms (L0ADRIDGE) for L0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored.
Conclusions: The developed Software L0ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.
Keywords: Big data mining; Classification; GLM; L0 penalty; Multi-omics data; Sparse modeling; Suboptimal debulking.
Conflict of interest statement
Not Applicable.Not Applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures
Similar articles
-
Efficient ℓ0 -norm feature selection based on augmented and penalized minimization.Stat Med. 2018 Feb 10;37(3):473-486. doi: 10.1002/sim.7526. Epub 2017 Oct 30. Stat Med. 2018. PMID: 29082539 Free PMC article.
-
Sparse support vector machines with L0 approximation for ultra-high dimensional omics data.Artif Intell Med. 2019 May;96:134-141. doi: 10.1016/j.artmed.2019.04.004. Epub 2019 Apr 30. Artif Intell Med. 2019. PMID: 31164207 Free PMC article.
-
Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction.Comput Math Methods Med. 2016;2016:3456153. doi: 10.1155/2016/3456153. Epub 2016 Oct 24. Comput Math Methods Med. 2016. PMID: 27843486 Free PMC article.
-
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138. BMC Bioinformatics. 2011. PMID: 21554689 Free PMC article.
-
Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data.Brief Bioinform. 2019 Mar 25;20(2):671-681. doi: 10.1093/bib/bby027. Brief Bioinform. 2019. PMID: 29688321 Review.
Cited by
-
Machine Learning and Integrative Analysis of Biomedical Big Data.Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087. Genes (Basel). 2019. PMID: 30696086 Free PMC article. Review.
-
Biomarker discovery for predicting spontaneous preterm birth from gene expression data by regularized logistic regression.Comput Struct Biotechnol J. 2020 Nov 10;18:3434-3446. doi: 10.1016/j.csbj.2020.10.028. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 33294138 Free PMC article.
-
OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer.PLoS Comput Biol. 2023 Mar 10;19(3):e1010333. doi: 10.1371/journal.pcbi.1010333. eCollection 2023 Mar. PLoS Comput Biol. 2023. PMID: 36897911 Free PMC article.
-
Barcoded bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast.Elife. 2022 Feb 11;11:e73983. doi: 10.7554/eLife.73983. Elife. 2022. PMID: 35147078 Free PMC article.
References
-
- Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B. 1996;58:267–88.
-
- Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101:1418–29. doi: 10.1198/016214506000000735. - DOI
-
- Lin D, Foster D, Ungar L. A risk ratio comparison of l0 and l1 penalized regressions. Tech. rep.,University of Pennsylvania; 2010.
-
- Kakade S, Shamir O, Sridharan K, Tewari A. Learning exponential families in high dimensions: strong convexity and sparsity. JMLR. 2013;9:381–8.
-
- Bahmani S, Raj B, Boufounos P. Greedy sparsity-constrained optimization. J Mach Learn Res. 2013;14(3):807–41.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
