Multi-platform, multi-site, microarray-based human tumor classification

Am J Pathol. 2004 Jan;164(1):9-16. doi: 10.1016/S0002-9440(10)63090-8.


The introduction of gene expression profiling has resulted in the production of rich human data sets with potential for deciphering tumor diagnosis, prognosis, and therapy. Here we demonstrate how artificial neural networks (ANNs) can be applied to two completely different microarray platforms (cDNA and oligonucleotide), or a combination of both, to build tumor classifiers capable of deciphering the identity of most human cancers. First, 78 tumors representing eight different types of histologically similar adenocarcinoma, were evaluated with a 32k cDNA microarray and correctly classified by a cDNA-based ANN, using independent training and test sets, with a mean accuracy of 83%. To expand our approach, oligonucleotide data derived from six independent performance sites, representing 463 tumors and 21 tumor types, were assembled, normalized, and scaled. An oligonucleotide-based ANN, trained on a random fraction of the tumors (n = 343), was 88% accurate in predicting known pathological origin of the remaining fraction of tumors (n = 120) not exposed to the training algorithm. Finally, a mixed-platform classifier using a combination of both cDNA and oligonucleotide microarray data from seven performance sites, normalized and scaled from a large and diverse tumor set (n = 539), produced similar results (85% accuracy) on independent test sets. Further validation of our classifiers was achieved by accurately (84%) predicting the known primary site of origin for an independent set of metastatic lesions (n = 50), resected from brain, lung, and liver, potentially addressing the vexing classification problems imposed by unknown primary cancers. These cDNA- and oligonucleotide-based classifiers provide a first proof of principle that data derived from multiple platforms and performance sites can be exploited to build multi-tissue tumor classifiers.

Publication types

  • Comparative Study
  • Multicenter Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Gene Expression Profiling / methods*
  • Humans
  • Neoplasm Metastasis / diagnosis
  • Neoplasms / classification*
  • Neoplasms / genetics
  • Neural Networks, Computer
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity