An efficient variance estimator of AUC and its applications to binary classification

Stat Med. 2020 Dec 10;39(28):4281-4300. doi: 10.1002/sim.8725. Epub 2020 Sep 10.

Abstract

The area under the ROC (receiver operating characteristic) curve, AUC, is one of the most commonly used measures to evaluate the performance of a binary classifier. Due to sampling variation, the model with the largest observed AUC score is not necessarily optimal, so it is crucial to assess the variation of AUC estimate. We extend the proposal by Wang and Lindsay and devise an unbiased variance estimator of AUC estimate that is of a two-sample U-statistic form. The proposal can be easily generalized to estimate the variance of a K-sample U-statistic (K ≥ 2). To make our developed variance estimator more applicable, we employ a partition-resampling scheme that is computationally efficient. Simulation studies suggest that the developed AUC variance estimator yields much better or comparable performance to jackknife and bootstrap variance estimators, and computational times that are about 10 to 30 times faster than the times of its counterparts. In practice, the proposal can be used in the one-standard-error rule for model selection, or to construct an asymptotic confidence interval of AUC in binary classification. In addition to conducting simulation studies, we illustrate its practical applications using two real datasets in medical sciences.

Keywords: AUC; ROC; U-statistic; binary classification; one-standard-error rule; variance estimation.

MeSH terms

  • Area Under Curve*
  • Computer Simulation
  • Humans
  • ROC Curve