Evaluating classification accuracy for modern learning approaches

Stat Med. 2019 Jun 15;38(13):2477-2503. doi: 10.1002/sim.8103. Epub 2019 Jan 30.


Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state-of-the-art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem-based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.

Keywords: R package; convolutional neural net; deep learning; multilayer perceptron; mxnet; neural network.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biopsy, Fine-Needle
  • Biostatistics / methods*
  • Breast Neoplasms / pathology
  • Decision Trees
  • Deep Learning*
  • Discriminant Analysis
  • Female
  • Humans
  • Leukemia / genetics
  • Logistic Models
  • Probability
  • Support Vector Machine