Experimental determination of subjective similarity for pairs of clustered microcalcifications on mammograms: observer study results

Med Phys. 2006 Sep;33(9):3460-8. doi: 10.1118/1.2266280.


Presentation of images of lesions similar to that of an unknown lesion might be useful to radiologists in distinguishing between benign and malignant clustered microcalcifications on mammograms. Investigators have been developing computerized schemes to select similar images from large databases. However, whether selected images are really similar in appearance is not examined for most of the schemes. In order to retrieve images that are useful to radiologists, the selected images must be similar from radiologists' diagnostic points of view. Therefore, in this study, the data of radiologists' subjective similarity for pairs of clustered microcalcification images were obtained from a number of observers, and the intra- and inter-observer variations and the intergroup correlations were determined to investigate whether reliable similarity ratings by human observers can be determined. Nineteen images of clustered microcalcifications, each of which was paired with six other images, were selected for the observer study. Thus, subjective similarity ratings for 114 pairs of clustered microcalcifications were determined by each observer. Thirteen breast, ten general, and ten nonradiologists participated in the observer study; some of them completed the study multiple times. Although the intraobserver variations for the individual readings and the interobserver variations for pairs of observers were not small, the interobserver agreements were improved by taking the average of readings by the same observers. When the similarity ratings by a number of observers were averaged among the groups of breast, general, and nonradiologists, the mean differences of the ratings between the groups decreased, and good concordance correlations (0.846, 0.817, and 0.785) between the groups were obtained. The result indicates that reliable similarity ratings can be determined by use of this method, and the average similarity ratings by breast radiologists can be considered meaningful and useful for the development and evaluation of a computerized scheme for selection of similar images.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Breast Diseases / diagnostic imaging*
  • Breast Diseases / epidemiology
  • Calcinosis / diagnostic imaging*
  • Calcinosis / epidemiology
  • Cluster Analysis
  • Female
  • Humans
  • Mammography / statistics & numerical data*
  • Observer Variation*
  • Pattern Recognition, Visual*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Task Performance and Analysis*