A stable iterative method for refining discriminative gene clusters

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S18. doi: 10.1186/1471-2164-9-S2-S18.

Abstract

Background: Microarray technology is often used to identify the genes that are differentially expressed between two biological conditions. On the other hand, since microarray datasets contain a small number of samples and a large number of genes, it is usually desirable to identify small gene subsets with distinct pattern between sample classes. Such gene subsets are highly discriminative in phenotype classification because of their tightly coupling features. Unfortunately, such identified classifiers usually tend to have poor generalization properties on the test samples due to overfitting problem.

Results: We propose a novel approach combining both supervised learning with unsupervised learning techniques to generate increasingly discriminative gene clusters in an iterative manner. Our experiments on both simulated and real datasets show that our method can produce a series of robust gene clusters with good classification performance compared with existing approaches.

Conclusion: This backward approach for refining a series of highly discriminative gene clusters for classification purpose proves to be very consistent and stable when applied to various types of training samples.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis
  • Computational Biology / methods
  • Computer Simulation
  • Discriminant Analysis
  • Gene Expression Profiling / methods*
  • Humans
  • Leukemia, Myeloid, Acute / genetics
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated / methods*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics