Methods for multi-category cancer diagnosis from gene expression data: a comprehensive evaluation to inform decision support system development

Alexander Statnikov; Constantin F Aliferis; Ioannis Tsamardinos

Methods for multi-category cancer diagnosis from gene expression data: a comprehensive evaluation to inform decision support system development

Stud Health Technol Inform. 2004;107(Pt 2):813-7.

Authors

Alexander Statnikov¹, Constantin F Aliferis, Ioannis Tsamardinos

Affiliation

¹ Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA. alexander.statnikov@vanderbilt.edu

PMID: 15360925

Abstract

Cancer diagnosis is a major clinical applications area of gene expression microarray technology. We are seeking to develop a system for cancer diagnostic model creation based on microarray data. In order to equip the system with the optimal combination of data modeling methods, we performed a comprehensive evaluation of several major classification algorithms, gene selection methods, and cross-validation designs using 11 datasets spanning 74 diagnostic categories (41 cancer types and 12 normal tissue types). The Multi-Category Support Vector Machine techniques by Crammer and Singer, Weston and Watkins, and one-versus-rest were found to be the best methods and they outperform other learning algorithms such as K-Nearest Neighbors and Neural Networks often to a remarkable degree. Gene selection techniques are shown to significantly improve classification performance. These results guided the development of a software system that fully automates cancer diagnostic model construction with quality on par with or better than previously published results derived by expert human analysts.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Diagnosis, Computer-Assisted
Expert Systems
Factor Analysis, Statistical
Gene Expression
Gene Expression Profiling / classification*
Humans
Neoplasms / diagnosis*
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis*
Pattern Recognition, Automated
Software

Abstract

Publication types

MeSH terms

Grants and funding