An efficient variance estimator of AUC and its applications to binary classification

Qing Wang; Alexandria Guo

doi:10.1002/sim.8725

An efficient variance estimator of AUC and its applications to binary classification

Stat Med. 2020 Dec 10;39(28):4281-4300. doi: 10.1002/sim.8725. Epub 2020 Sep 10.

Authors

Qing Wang¹, Alexandria Guo¹

Affiliation

¹ Department of Mathematics, Wellesley College, Wellesley, Massachusetts.

PMID: 32914457
DOI: 10.1002/sim.8725

Abstract

The area under the ROC (receiver operating characteristic) curve, AUC, is one of the most commonly used measures to evaluate the performance of a binary classifier. Due to sampling variation, the model with the largest observed AUC score is not necessarily optimal, so it is crucial to assess the variation of AUC estimate. We extend the proposal by Wang and Lindsay and devise an unbiased variance estimator of AUC estimate that is of a two-sample U-statistic form. The proposal can be easily generalized to estimate the variance of a K-sample U-statistic (K ≥ 2). To make our developed variance estimator more applicable, we employ a partition-resampling scheme that is computationally efficient. Simulation studies suggest that the developed AUC variance estimator yields much better or comparable performance to jackknife and bootstrap variance estimators, and computational times that are about 10 to 30 times faster than the times of its counterparts. In practice, the proposal can be used in the one-standard-error rule for model selection, or to construct an asymptotic confidence interval of AUC in binary classification. In addition to conducting simulation studies, we illustrate its practical applications using two real datasets in medical sciences.

Keywords: AUC; ROC; U-statistic; binary classification; one-standard-error rule; variance estimation.

MeSH terms

Area Under Curve*
Computer Simulation
Humans
ROC Curve