"Binary" and "non-binary" detection tasks: are current performance measures optimal?

Acad Radiol. 2007 Jul;14(7):871-6. doi: 10.1016/j.acra.2007.03.014.


We have observed that a very large fraction of responses for several detection tasks during the performance of observer studies are in the extreme ranges of lower than 11% or higher than 89% regardless of the actual presence or absence of the abnormality in question or its subjectively rated "subtleness." This observation raises questions regarding the validity and appropriateness of using multicategory rating scales for such detection tasks. Monte Carlo simulation of binary and multicategory ratings for these tasks demonstrate that the use of the former (binary) often results in a less biased and more precise summary index and hence may lead to a higher statistical power for determining differences between modalities.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Area Under Curve
  • Clinical Competence*
  • Diagnosis, Computer-Assisted
  • Diagnostic Imaging*
  • Humans
  • Lung Diseases / diagnosis*
  • Monte Carlo Method*
  • Observer Variation*
  • ROC Curve
  • Reproducibility of Results