Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine

J Theor Biol. 2019 Feb 21:463:77-91. doi: 10.1016/j.jtbi.2018.12.010. Epub 2018 Dec 8.

Abstract

At present, the study of gene expression data provides a reference for tumor diagnosis at the molecular level. It is a challenging task to select the feature genes related to the classification from the high-dimensional and small-sample gene expression data and successfully separate the different subtypes of tumor or between the normal and patient. In this paper, we present a new method for tumor classification-relaxed Lasso (least absolute shrinkage and selection operator) and generalized multi-class support vector machine (rL-GenSVM). The tumor datasets are firstly z-score normalized. Secondly, using relaxed Lasso to select feature gene sets on training set, and finally, generalized multi-class support vector machine (GenSVM) serves as a classifier. We select four two-class datasets and four multi-class datasets for experiments. And four classifiers are used to predict and compare the classification accuracy on test set. To compare with other proposed methods, we obtain satisfactory classification accuracy by 10-fold cross-validation on all samples of each dataset. The experimental results show that the method proposed in this paper selects fewer feature genes and achieves higher classification accuracy. rL-GenSVM uses regularization parameters to avoid overfitting and can be widely applied to high-dimensional and small-sample tumor data classification. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/rL-GenSVM/.

Keywords: Feature genes; Gene expression data; Generalized multi-class support vector machine; Relaxed Lasso; Tumor classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Datasets as Topic*
  • Gene Expression Profiling
  • Humans
  • Neoplasms / classification*
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis*
  • Software
  • Support Vector Machine*