Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Biostatistics. 2010 Oct;11(4):599-608. doi: 10.1093/biostatistics/kxq023. Epub 2010 May 26.

Abstract

In many high-dimensional microarray classification problems, an important task is to identify subsets of genes that best discriminate the classes. Nevertheless, existing gene selection methods for microarray classification cannot identify which classes are discriminable by these selected genes. In this paper, we propose an improved linear discriminant analysis (LDA) method that simultaneously selects important genes and identifies the discriminable classes. Specifically, a pairwise fusion penalty for LDA was used to shrink the differences of the class centroids in pairs for each variable and fuse the centroids of indiscriminable classes altogether. The numerical results in analyzing 2 gene expression profiles demonstrate the proposed approach help improve the interpretation of important genes in microarray classification problems.

MeSH terms

  • Algorithms
  • Bias
  • Biostatistics / methods*
  • Burkitt Lymphoma / classification
  • Burkitt Lymphoma / genetics
  • Child
  • Classification / methods*
  • Computer Simulation
  • Discriminant Analysis
  • False Negative Reactions
  • False Positive Reactions
  • Gene Expression Profiling / methods*
  • Humans
  • Neuroblastoma / classification
  • Neuroblastoma / genetics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Rhabdomyosarcoma / classification
  • Rhabdomyosarcoma / genetics
  • Sarcoma, Ewing / classification
  • Sarcoma, Ewing / genetics