A combinational feature selection and ensemble neural network method for classification of gene expression data

BMC Bioinformatics. 2004 Sep 27:5:136. doi: 10.1186/1471-2105-5-136.

Abstract

Background: Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.

Results: We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets.

Conclusions: Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Acute Disease
  • Artificial Intelligence
  • Colonic Neoplasms / classification
  • Colonic Neoplasms / genetics
  • Female
  • Gene Expression Profiling / classification*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / genetics*
  • Humans
  • Leukemia, Myeloid / classification
  • Leukemia, Myeloid / genetics
  • Lung Neoplasms / classification
  • Lung Neoplasms / genetics
  • Lymphoma, B-Cell / classification
  • Lymphoma, B-Cell / genetics
  • Lymphoma, Large B-Cell, Diffuse / classification
  • Lymphoma, Large B-Cell, Diffuse / genetics
  • Male
  • Neural Networks, Computer*
  • Oligonucleotide Array Sequence Analysis / classification*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Ovarian Neoplasms / classification
  • Ovarian Neoplasms / genetics
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Predictive Value of Tests
  • Prostatic Neoplasms / classification
  • Prostatic Neoplasms / genetics