A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Abstract

Motivation: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types.

Results: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets.

Availability: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use.

Contact: alexander.statnikov@vanderbilt.edu.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Biomarkers, Tumor / metabolism*
  • Cluster Analysis
  • Diagnosis, Computer-Assisted / methods*
  • Gene Expression Profiling / methods*
  • Genetic Predisposition to Disease / genetics
  • Genetic Testing / methods*
  • Humans
  • Neoplasm Proteins / genetics
  • Neoplasm Proteins / metabolism*
  • Neoplasms / diagnosis*
  • Neoplasms / genetics
  • Neoplasms / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software
  • User-Computer Interface

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins