Building an asynchronous web-based tool for machine learning classification

Proc AMIA Symp. 2002;869-73.


Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • DNA, Neoplasm
  • Gene Expression
  • Genetic Markers*
  • Humans
  • Internet
  • Logistic Models*
  • Neoplasms / classification*
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis
  • Software*


  • DNA, Neoplasm
  • Genetic Markers