Identification of full and partial class relevant genes

IEEE/ACM Trans Comput Biol Bioinform. 2010 Apr-Jun;7(2):263-77. doi: 10.1109/TCBB.2008.105.

Abstract

Multiclass cancer classification on microarray data has provided the feasibility of cancer diagnosis across all of the common malignancies in parallel. Using multiclass cancer feature selection approaches, it is now possible to identify genes relevant to a set of cancer types. However, besides identifying the relevant genes for the set of all cancer types, it is deemed to be more informative to biologists if the relevance of each gene to specific cancer or subset of cancer types could be revealed or pinpointed. In this paper, we introduce two new definitions of multiclass relevancy features, i.e., full class relevant (FCR) and partial class relevant (PCR) features. Particularly, FCR denotes genes that serve as candidate biomarkers for discriminating all cancer types. PCR, on the other hand, are genes that distinguish subsets of cancer types. Subsequently, a Markov blanket embedded memetic algorithm is proposed for the simultaneous identification of both FCR and PCR genes. Results obtained on commonly used synthetic and real-world microarray data sets show that the proposed approach converges to valid FCR and PCR genes that would assist biologists in their research work. The identification of both FCR and PCR genes is found to generate improvement in classification accuracy on many microarray data sets. Further comparison study to existing state-of-the-art feature selection algorithms also reveals the effectiveness and efficiency of the proposed approach.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Genetic
  • Genes*
  • Humans
  • Markov Chains
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis

Substances

  • Biomarkers, Tumor