Feature selection may improve deep neural networks for the bioinformatics problems

Zheng Chen; Meng Pang; Zixin Zhao; Shuainan Li; Rui Miao; Yifan Zhang; Xiaoyue Feng; Xin Feng; Yexian Zhang; Meiyu Duan; Lan Huang; Fengfeng Zhou

doi:10.1093/bioinformatics/btz763

Feature selection may improve deep neural networks for the bioinformatics problems

Bioinformatics. 2020 Mar 1;36(5):1542-1552. doi: 10.1093/bioinformatics/btz763.

Authors

Zheng Chen^{1

2}, Meng Pang^{1

2}, Zixin Zhao^{1

2}, Shuainan Li^{1

2}, Rui Miao^{1

2}, Yifan Zhang^{1

2}, Xiaoyue Feng^{1

2}, Xin Feng^{1

2}, Yexian Zhang^{1

2}, Meiyu Duan^{1

2}, Lan Huang^{1

2}, Fengfeng Zhou^{1

2}

Affiliations

¹ BioKnow Health Informatics Lab, College of Computer Science and Technology.
² Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.

PMID: 31591638
DOI: 10.1093/bioinformatics/btz763

Abstract

Motivation: Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms.

Results: A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets.

Availability and implementation: All the algorithms were implemented and tested under the programming environment Python version 3.6.6.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology*
Neural Networks, Computer*