Multi-class cancer classification via partial least squares with gene expression profiles

Bioinformatics. 2002 Sep;18(9):1216-26. doi: 10.1093/bioinformatics/18.9.1216.

Abstract

Motivation: Discrimination between two classes such as normal and cancer samples and between two types of cancers based on gene expression profiles is an important problem which has practical implications as well as the potential to further our understanding of gene expression of various cancer cells. Classification or discrimination of more than two groups or classes (multi-class) is also needed. The need for multi-class discrimination methodologies is apparent in many microarray experiments where various cancer types are considered simultaneously.

Results: Thus, in this paper we present the extension to the classification methodology proposed earlier Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50) to classify cancer samples from multiple classes. The methodologies proposed in this paper are applied to four gene expression data sets with multiple classes: (a) a hereditary breast cancer data set with (1) BRCA1-mutation, (2) BRCA2-mutation and (3) sporadic breast cancer samples, (b) an acute leukemia data set with (1) acute myeloid leukemia (AML), (2) T-cell acute lymphoblastic leukemia (T-ALL) and (3) B-cell acute lymphoblastic leukemia (B-ALL) samples, (c) a lymphoma data set with (1) diffuse large B-cell lymphoma (DLBCL), (2) B-cell chronic lymphocytic leukemia (BCLL) and (3) follicular lymphoma (FL) samples, and (d) the NCI60 data set with cell lines derived from cancers of various sites of origin. In addition, we evaluated the classification algorithms and examined the variability of the error rates using simulations based on randomization of the real data sets. We note that there are other methods for addressing multi-class prediction recently and our approach is along the line of Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50).

Contact: dnguyen@stat.tamu.edu; dmrocke@ucdavis.edu

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Breast Neoplasms / classification
  • Breast Neoplasms / genetics
  • Databases, Nucleic Acid
  • Discriminant Analysis
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Least-Squares Analysis
  • Leukemia / classification
  • Leukemia / genetics
  • Lymphoma / classification
  • Lymphoma / genetics
  • Models, Genetic
  • Models, Statistical
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Principal Component Analysis
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods
  • Tumor Cells, Cultured