Hypothesis testing in noninferiority and equivalence MRMC ROC studies

Acad Radiol. 2012 Sep;19(9):1158-65. doi: 10.1016/j.acra.2012.04.011. Epub 2012 Jun 19.

Abstract

Rationale and objectives: Conventional multireader multicase receiver operating characteristic (MRMC ROC) methodologies use hypothesis testing to test differences in diagnostic accuracies among several imaging modalities. The general MRMC-ROC analysis framework is designed to show that one modality is statistically different among a set of competing modalities (ie, the superiority setting). In practice, one may wish to show that the diagnostic accuracy of a modality is noninferior or equivalent, in a statistical sense, to that of another modality instead of showing its superiority (a higher bar). The purpose of this article is to investigate the appropriate adjustments to the conventional MRMC ROC hypothesis testing methodology for the design and analysis of noninferiority and equivalence hypothesis tests.

Materials and methods: We present three methodological adjustments to the updated and unified Obuchowski-Rockette (OR)/Dorfman-Berbaum-Metz (DBM) MRMC ROC method for use in statistical noninferiority/equivalence testing: 1) the appropriate statement of the null and alternative hypotheses; 2) a method for analyzing the experimental data; and 3) a method for sizing MRMC noninferiority/equivalence studies. We provide a clinical example to further illustrate the analysis of and sizing/power calculation for noninferiority MRMC ROC studies and give some insights on the interplay of effect size, noninferiority margin parameter, and sample sizes.

Results: We provide detailed analysis and sizing computation procedures for a noninferiority MRMC ROC study using our method adjusted from the updated and unified OR/DBM MRMC method. Likewise, we show that an equivalence hypothesis test is identical to performing two simultaneous noninferiority tests (ie, either modality is noninferior to the other).

Conclusion: Conventional MRMC ROC methodology developed for superiority studies can and should be adjusted appropriately for the design and analysis of a noninferiority/equivalence hypothesis testing. In addition, the confidence interval of the difference in diagnostic accuracies is important information and should generally accompany the statistical analysis and any conclusions drawn from the hypothesis testing.

MeSH terms

  • Analysis of Variance
  • Aortic Aneurysm / diagnosis
  • Aortic Dissection / diagnosis
  • Diagnostic Imaging*
  • Humans
  • Magnetic Resonance Imaging
  • Models, Statistical
  • Observer Variation
  • ROC Curve*