The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations

Radiology. 2008 Oct;249(1):47-53. doi: 10.1148/radiol.2491072025. Epub 2008 Aug 5.


Purpose: To compare radiologists' performance during interpretation of screening mammograms in the clinic with their performance when reading the same mammograms in a retrospective laboratory study.

Materials and methods: This study was conducted under an institutional review board-approved, HIPAA-compliant protocol; the need for informed consent was waived. Nine experienced radiologists rated an enriched set of mammograms that they had personally read in the clinic (the "reader-specific" set) mixed with an enriched "common" set of mammograms that none of the participants had previously read in the clinic by using a screening Breast Imaging Reporting and Data System (BI-RADS) rating scale. The original clinical recommendations to recall the women for a diagnostic work-up, for both reader-specific and common sets, were compared with their recommendations during the retrospective experiment. The results are presented in terms of reader-specific and group-averaged sensitivity and specificity levels and the dispersion (spread) of reader-specific performance estimates.

Results: On average, the radiologists' performance was significantly better in the clinic than in the laboratory (P = .035). Interreader dispersion of the computed performance levels was significantly lower during the clinical interpretations (P < .01).

Conclusion: Retrospective laboratory experiments may not represent either expected performance levels or interreader variability during clinical interpretations of the same set of mammograms in the clinical environment well.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Clinical Competence*
  • Female
  • Humans
  • Laboratories
  • Mammography* / standards
  • Retrospective Studies
  • Sensitivity and Specificity