Objective: We sought to evaluate the use of the Breast Imaging Reporting and Data System (BI-RADS) standardized mammography lexicon among and within observers and to distinguish variability in feature analysis from variability in lesion management.
Materials and methods: Five experienced mammographers, not specifically trained in BI-RADS, used the lexicon to describe and assess 103 screening mammograms, including 30 (29%) showing cancer, and a subset of 86 mammograms with diagnostic evaluation, including 23 (27%) showing cancer. A subset of 13 screening mammograms (two with malignant findings, 11 with diagnostic evaluation) were rereviewed by each observer 2 months later. Kappa statistics were calculated as measures of agreement beyond chance.
Results: After diagnostic evaluation, the interobserver kappa values for describing features were as follows: breast density, 0.43; lesion type, 0.75; mass borders, 0.40; special cases, 0.56; mass density, 0.40; mass shape, 0.28; microcalcification morphology, 0.36; and microcalcification distribution, 0.47. Lesion management was highly variable, with a kappa value for final assessment of 0.37. When we grouped assessments recommending immediate additional evaluation and biopsy (BI-RADS categories 0, 4, and 5 combined) versus follow-up (categories 1, 2, and 3 combined), five observers agreed on management for only 47 (55%) of 86 lesions. Intraobserver agreement on management (additional evaluation or biopsy versus follow-up) was seen in 47 (85%) of 55 interpretations, with a kappa value of 0.35-1.0 (mean, 0.60) for final assessment.
Conclusion: Inter- and intraobserver variability in mammographic interpretation is substantial for both feature analysis and management. Continued development of methods to improve standardization in mammographic interpretation is needed.