Feature Screening via Distance Correlation Learning
- PMID: 25249709
- PMCID: PMC4170057
- DOI: 10.1080/01621459.2012.695654
Feature Screening via Distance Correlation Learning
Abstract
This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS can be implemented as easily as the sure independence screening procedure based on the Pearson correlation (SIS, for short) proposed by Fan and Lv (2008). However, the DC-SIS can significantly improve the SIS. Fan and Lv (2008) established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and for multivariate response variables. We establish the sure screening property for the DC-SIS, and conduct simulations to examine its finite sample performance. Numerical comparison indicates that the DC-SIS performs much better than the SIS in various models. We also illustrate the DC-SIS through a real data example.
Keywords: Distance correlation; sure screening property; ultrahigh dimensionality; variable selection.
Figures
Similar articles
-
Feature Screening in Ultrahigh Dimensional Cox's Model.Stat Sin. 2016;26:881-901. doi: 10.5705/ss.2014.171. Stat Sin. 2016. PMID: 27418749 Free PMC article.
-
Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.Sci China Math. 2017 Mar;60(3):551-568. doi: 10.1007/s11425-016-0186-8. Epub 2016 Dec 29. Sci China Math. 2017. PMID: 28649265 Free PMC article.
-
Feature Screening for High-Dimensional Variable Selection in Generalized Linear Models.Entropy (Basel). 2023 May 26;25(6):851. doi: 10.3390/e25060851. Entropy (Basel). 2023. PMID: 37372195 Free PMC article.
-
A generic model-free feature screening procedure for ultra-high dimensional data with categorical response.Comput Methods Programs Biomed. 2023 Feb;229:107269. doi: 10.1016/j.cmpb.2022.107269. Epub 2022 Nov 26. Comput Methods Programs Biomed. 2023. PMID: 36463676 Review.
-
Variable Screening for Near Infrared (NIR) Spectroscopy Data Based on Ridge Partial Least Squares Regression.Comb Chem High Throughput Screen. 2020;23(8):740-756. doi: 10.2174/1386207323666200428114823. Comb Chem High Throughput Screen. 2020. PMID: 32342803 Review.
Cited by
-
Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score.J Mach Learn Res. 2022;23:262. J Mach Learn Res. 2022. PMID: 38098839 Free PMC article.
-
Feature selection based on distance correlation: a filter algorithm.J Appl Stat. 2020 Sep 7;49(2):411-426. doi: 10.1080/02664763.2020.1815672. eCollection 2022. J Appl Stat. 2020. PMID: 35707211 Free PMC article.
-
Regularized Quantile Regression and Robust Feature Screening for Single Index Models.Stat Sin. 2016 Jan;26(1):69-95. doi: 10.5705/ss.2014.049. Stat Sin. 2016. PMID: 26941542 Free PMC article.
-
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.J Am Stat Assoc. 2014;109(507):1270-1284. doi: 10.1080/01621459.2013.879828. J Am Stat Assoc. 2014. PMID: 25309009 Free PMC article.
-
Score test variable screening.Biometrics. 2014 Dec;70(4):862-71. doi: 10.1111/biom.12209. Epub 2014 Aug 14. Biometrics. 2014. PMID: 25124197 Free PMC article.
References
-
- Bild A, Yao G, Chang JT, Wang Q, Potti A, et al. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006;439:353–357. - PubMed
-
- Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n (with discussion) Annals of Statistics. 2007;35:2313–2404.
-
- Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression (with discussion) Annals of Statistics. 2004;32:409–499.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources