Noise incorporated subwindow permutation analysis for informative gene selection using support vector machines

Analyst. 2011 Apr 7;136(7):1456-63. doi: 10.1039/c0an00667j. Epub 2011 Feb 14.

Abstract

Selecting a small subset of informative genes plays an important role in accurate prediction of clinical tumor samples. Based on model population analysis, a novel variable selection method, called noise incorporated subwindow permutation analysis (NISPA), is proposed in this study to work with support vector machines (SVMs). The essence of NISPA lies in the point that one noise variable is added into each sampled sub-dataset and then the distribution of variable importance of the added noise could be computed and serves as the common reference to evaluate the experimental variables. Further, by using the non-parametric Mann-Whitney U test, a P value can be assigned to each variable which describes to what extent the distributions of the gene variable and the noise variable are different. According to the computed P values, all the variables could be ranked and then a small subset of informative variables could be determined to build the model. Moreover, by NISPA, we are the first to distinguish the variables into a more detailed classification as informative, uninformative (noise) and interfering variables in comparison with other methods. In this study, two microarray datasets are employed to evaluate the performance of NISPA. The results show that the prediction errors of SVM classifiers could be significantly reduced by variable selection using NISPA. It is concluded that NISPA is a good alternative of variable selection algorithm.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Colon / metabolism
  • Colonic Neoplasms / genetics
  • Databases, Factual
  • Estrogens / genetics
  • Gene Expression Profiling / methods*
  • Humans
  • Models, Genetic
  • Software

Substances

  • Estrogens