Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
- PMID: 29914084
- PMCID: PMC6027449
- DOI: 10.3390/genes9060301
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
Abstract
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.
Keywords: RFE; decision variant; feature selection; random forest; voting.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy.Brief Bioinform. 2020 Mar 23;21(2):687-698. doi: 10.1093/bib/bbz021. Brief Bioinform. 2020. PMID: 30860571
-
Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29. Comput Biol Chem. 2022. PMID: 35932551
-
Ensemble Feature Learning of Genomic Data Using Support Vector Machine.PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016. PLoS One. 2016. PMID: 27304923 Free PMC article.
-
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.Molecules. 2017 Dec 26;23(1):52. doi: 10.3390/molecules23010052. Molecules. 2017. PMID: 29278382 Free PMC article.
-
A novel feature selection method to predict protein structural class.Comput Biol Chem. 2018 Oct;76:118-129. doi: 10.1016/j.compbiolchem.2018.06.007. Epub 2018 Jul 2. Comput Biol Chem. 2018. PMID: 29990791
Cited by
-
A brain tumor computer-aided diagnosis method with automatic lesion segmentation and ensemble decision strategy.Front Med (Lausanne). 2023 Sep 29;10:1232496. doi: 10.3389/fmed.2023.1232496. eCollection 2023. Front Med (Lausanne). 2023. PMID: 37841015 Free PMC article.
-
Blood-Based Transcriptomic Biomarkers Are Predictive of Neurodegeneration Rather Than Alzheimer's Disease.Int J Mol Sci. 2023 Oct 9;24(19):15011. doi: 10.3390/ijms241915011. Int J Mol Sci. 2023. PMID: 37834458 Free PMC article.
-
Socio-Environmental Determinants of Mental and Behavioral Disorders in Youth: A Machine Learning Approach.Geohealth. 2023 Sep 13;7(9):e2023GH000839. doi: 10.1029/2023GH000839. eCollection 2023 Sep. Geohealth. 2023. PMID: 37711362 Free PMC article.
-
Optimization of predictive performance of intrusion detection system using hybrid ensemble model for secure systems.PeerJ Comput Sci. 2023 Sep 4;9:e1552. doi: 10.7717/peerj-cs.1552. eCollection 2023. PeerJ Comput Sci. 2023. PMID: 37705624 Free PMC article.
-
AUD-DSS: a decision support system for early detection of patients with alcohol use disorder.BMC Bioinformatics. 2023 Sep 2;24(1):329. doi: 10.1186/s12859-023-05450-6. BMC Bioinformatics. 2023. PMID: 37658294 Free PMC article.
References
-
- James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning. Springer; New York, NY, USA: 2013.
-
- Luukka P. Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst. Appl. 2011;38:4600–4607. doi: 10.1016/j.eswa.2010.09.133. - DOI
-
- Zareapoor M., Seeja K.R. Feature extraction or feature selection for text classification: A case study on phishing email detection. Int. J. Inf. Eng. Electron. Bus. 2015;2:60–65. doi: 10.5815/ijieeb.2015.02.08. - DOI
LinkOut - more resources
Full Text Sources
Other Literature Sources
