Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 15;9(6):301.
doi: 10.3390/genes9060301.

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

Affiliations
Free PMC article

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

Qi Chen et al. Genes (Basel). .
Free PMC article

Abstract

Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.

Keywords: RFE; decision variant; feature selection; random forest; voting.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The statistical analysis of 30 most recent publications which used recursive feature elimination (RFE) for feature selection. HA: Used the highest classification accuracy as the decision variant; PreNum: Used a pre-defined number of features as variant; No: Represents that no choice was made; Other: sed other variants for feature selection.
Figure 2
Figure 2
The main procedure of the recursive feature elimination (RFE) method.
Figure 3
Figure 3
The three variants we analyzed in this study: HA, 90% HA, and PreNum (equals to 12). The result was analyzed based on the TG-Gates_500 data.
Figure 4
Figure 4
Voting strategy to select the optimal feature subset after the 10-fold cross-validation. Here, we assume that the top two ranked features have votes 7 and 5, respectively.
Figure 5
Figure 5
The frequency of votes of the selected features in the candidate feature pool.

Similar articles

Cited by

References

    1. James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning. Springer; New York, NY, USA: 2013.
    1. Luukka P. Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst. Appl. 2011;38:4600–4607. doi: 10.1016/j.eswa.2010.09.133. - DOI
    1. Zareapoor M., Seeja K.R. Feature extraction or feature selection for text classification: A case study on phishing email detection. Int. J. Inf. Eng. Electron. Bus. 2015;2:60–65. doi: 10.5815/ijieeb.2015.02.08. - DOI
    1. Su R., Xiong S., Zink D., Loo L.H. High-throughput imaging-based nephrotoxicity prediction for xenobiotics with diverse chemical structures. Arch. Toxicol. 2016;90:2793–2808. doi: 10.1007/s00204-015-1638-y. - DOI - PMC - PubMed
    1. Saeys Y., Inza I., Larrañaga P. WLD: Review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. doi: 10.1093/bioinformatics/btm344. - DOI - PubMed

LinkOut - more resources