Rationale and objectives: Although research has successfully documented variability in radiologists' interpretation of mammograms, it has failed to determine the relative contributions of case-specific features and reader inconsistency. Training interventions to improve consistency will be ineffectual if they do not target the principal determinants of disagreement among radiologists. The current study assessed the relative contributions of the case and the interpreter to the problem of inconsistent interpretation.
Materials and methods: One hundred ten radiologists independently interpreted mammograms from the same 148 screening cases (43% with biopsy-proved cancers) and reported the presence or absence of calcifications, mass, architectural distortion, and asymmetric density in each of 296 breasts. The radiologists were blinded to disease status (established at biopsy or follow-up).
Results: Case-related differences accounted for a greater proportion of interpretation disagreement than did differences between interpreters. The presence of cancer was associated with increased disagreement, perhaps because of the multiplicity of findings. Patient age was also associated with increased disagreement in the reporting of calcifications.
Conclusion: For screening mammography, increased consistency between radiologists in their recognition and reporting of clinically important findings will best be achieved by reducing disagreement in difficult cases. Current training in the United States addresses difficult cases only as they have been defined intuitively or experientially. The authors' population-based method provides an objective metric to measure case difficulty and basis from which to identify difficult cases for targeted training.