Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves

Stat Med. 2006 Apr 15;25(7):1219-38. doi: 10.1002/sim.2358.


Assessment of equivalence or non-inferiority in accuracy between two diagnostic procedures often involves comparisons of paired areas under the receiver operating characteristic (ROC) curves. With some pre-specified clinically meaningful limits, the current approach to evaluating equivalence is to perform the two one-sided tests (TOST) based on the difference in paired areas under ROC curves estimated by the non-parametric method. We propose to use the standardized difference for assessing equivalence or non-inferiority in diagnostic accuracy based on paired areas under ROC curves between two diagnostic procedures. The bootstrap technique is also suggested for both non-parametric method and the standardized difference approach. A simulation study was conducted empirically to investigate the size and power of the four methods for various combinations of distributions, data types, sample sizes, and different correlations. Simulation results demonstrate that the bootstrap procedure of the standardized difference approach not only can adequately control the type I error rate at the nominal level but also provides equivalent power under both symmetrical and skewed distributions. A numerical example using published data illustrates the proposed methods.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical*
  • Diagnostic Services / standards
  • Diagnostic Services / statistics & numerical data*
  • Humans
  • Models, Statistical*
  • ROC Curve*
  • Randomized Controlled Trials as Topic / methods*
  • Research Design
  • Sample Size
  • Sensitivity and Specificity
  • Statistics, Nonparametric
  • Therapeutic Equivalency*