A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset

Genomics. 2005 Jan;85(1):16-23. doi: 10.1016/j.ygeno.2004.09.007.


Development of a robust and efficient approach for extracting useful information from microarray data continues to be a significant and challenging task. Microarray data are characterized by a high dimension, high signal-to-noise ratio, and high correlations between genes, but with a relatively small sample size. Current methods for dimensional reduction can further be improved for the scenario of the presence of a single (or a few) high influential gene(s) in which its effect in the feature subset would prohibit inclusion of other important genes. We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization was to exploit fully their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of feature genes) for identification of key feature genes (or molecular signatures) for a complex biological phenotype. We have applied the approach to the microarray data of diffuse large B cell lymphoma to demonstrate its behaviors and properties for mining the high-dimension data of genome-wide gene expression profiles. The resulting classifier(s) (the optimal gene subset(s)) has achieved the highest accuracy (99%) for prediction of independent microarray samples in comparisons with marginal filters and a hybrid between genetic algorithm and K nearest neighbors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Leukemic / genetics*
  • Humans
  • Lymphoma, B-Cell / genetics*
  • Models, Genetic*
  • Oligonucleotide Array Sequence Analysis / methods*