Unified measurement of observer performance in detecting and localizing target objects on images

Med Phys. 1996 Oct;23(10):1709-25. doi: 10.1118/1.597758.


In this paper methods used to measure observer performance are reviewed, and a simple general model for finding and reporting target objects in gray-scale image backgrounds is presented. That model provides the basis for a combined measurement of detection and localization performance in various image-interpretation tasks, whether by human observers or by realized computer algorithms. The model assumes that (1) an observer's detection response and first choice of target location both depend on the "maximally suspicious" finding on an image, (2) a correct (first-choice) localization of the actual target occurs if and only if its location is selected as the most suspicious, and (3) a target's presence does not alter the degree of suspicion engendered by any other (normal) image findings. Formalization of these assumptions relates the ROC curve, which measures the ability to discriminate between images containing targets and images without targets, to the "Localization Response" (LROC) curve, which measures the conjoint ability to detect and correctly localize the actual targets in those images. A maximum-likelihood statistical procedure, developed for a two-parameter "binormal" version of this model, concurrently fits both the ROC and LROC curves from an observer's image ratings and target localizations for a set of image interpretations. The model's application is illustrated (and compared to standard ROC analysis) using sets of rating and localization data from radiologists asked to search chest films for pulmonary nodules. This model is then extended to multiple-report ("free-response") interpretations of multiple-target images, under the stringent requirement that an observer's detection capability and criterion for reporting possible targets both remain stationary across images and across the successive reports made on a given image. That extended model yields formulations and predictions for the so-called "Free-Response" (FROC) curve, and for a recently proposed "Alternative FROC" (AFROC) curve. Tests of that model's "stationarity" assumptions are illustrated using radiologists' free-search interpretations of chest films for pulmonary nodules, and they suggest that human observers may often violate those assumptions when making multiple-report interpretations of images.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Algorithms
  • Discrimination, Psychological
  • False Positive Reactions
  • Humans
  • Lung / diagnostic imaging
  • Lung Neoplasms / diagnostic imaging
  • Models, Theoretical
  • Observer Variation*
  • Probability
  • Radiography / standards*
  • Radiography, Thoracic / standards
  • Reproducibility of Results