Background: Screening mammography has lower sensitivity and specificity for women with increased breast density, who also have a higher risk of breast cancer.
Purpose: To systematically review the evidence on the accuracy and reproducibility of the Breast Imaging-Reporting and Data System (BI-RADS) breast density assessment scores, as well as the evidence on test performance and clinical outcomes of supplemental screening with hand-held ultrasound (HHUS), automated whole breast ultrasound (ABUS), magnetic resonance imaging (MRI), and digital breast tomosynthesis (DBT) for women with dense breasts and negative screening mammography.
We searched MEDLINE, PubMed, Embase, and the Cochrane library from January 2000 through July 2015. We reviewed the reference lists of included studies and relevant systematic reviews to identify relevant articles that were published before the timeframe or not identified in our literature searches. We also searched the grey literature for relevant reports and reviewed their references, and identified articles based on suggestions from experts. We searched
Study Selection: Two reviewers independently reviewed the titles and abstracts of all identified articles to determine if studies met the inclusion and exclusion criteria. All studies were required to report the study population and results for women with BI-RADS breast density c/d or equivalent. Two reviewers then independently evaluated the potential relevant full-text articles against a priori inclusion and exclusion criteria. Disagreements in the abstract and/or full-text review were resolved through consensus discussion.
Data Extraction: A single reviewer independently abstracted study characteristics and results into tables. A second reviewer independently reviewed each study and checked tables for accuracy. Subgroups of women with dense breasts were abstracted separately when reported or if data were provided by study authors.
Data Analysis: Evidence for all key questions was qualitatively synthesized. Sensitivity, specificity, positive and negative predictive value, cancer detection rates, recall rates, and biopsy rates were calculated for individual study subgroups of women with dense breasts. 95% confidence intervals were calculated by the exact method for each study estimate of sensitivity, specificity, cancer detection rates, and biopsy rates.
Results: There is no recognized gold standard for breast density determination, so no studies were identified that evaluated the accuracy of the BI-RADS breast density categories in screening mammography. Five studies reported statistical measures for the reproducibility of categorical BI-RADS breast density classification among women predominantly or exclusively receiving screening mammograms. Best estimates from U.S. data suggest about one in five women would be categorized into a different BI-RADS density category (a, b, c, d) by the same radiologist at the next screening exam, while one in three would be categorized differently if the next screening exam were read by a different radiologist. Major re-categorization (i.e., from “dense” categories [c or d] to “non-dense” categories [a or b], or vice versa) at the next screening examination occurred in 12.6 to 18.7 percent of women. For test performance characteristics of supplemental screening of women with dense breasts and negative screening mammography, two good-quality and three fair-quality studies reported on HHUS, one fair-quality study reported on ABUS, and three good-quality studies reported on MRI. We identified no studies of DBT performance among women with dense breasts and negative screening mammography. In the good-quality HHUS studies, for all breast cancer (defined as including DCIS and invasive breast cancer) the sensitivity ranged from 80.0 to 83.0 percent, specificity ranged from 86.4 to 94.5 percent and positive predictive value (PPV) ranged from 3.2 to 7.5 percent. For ABUS, the sensitivity was 67.6 percent, specificity was 91.6 percent, and PPV was 4.1 percent. In the three MRI studies, which were smaller and included high-risk women, the sensitivity ranged from 75.0 to 100.0 percent, specificity ranged from 78.1 to 88.7 percent and PPV ranged from 3.0 to 33.3 percent. No studies were identified that examined the impact of supplemental screening on breast cancer recurrence rates or mortality for women with dense breasts. We identified observational studies that reported breast cancer detection rates, recall rates, and biopsy rates: ten studies of HHUS (two good-quality), three fair-quality studies of ABUS, three good-quality studies of MRI, and four fair-quality studies of DBT. Most studies compared screening outcomes in the same cohort pre- and post-supplemental testing. One study of HHUS, two of ABUS and three studies of DBT compared clinical outcomes of two groups of women undergoing mammography, with and without supplemental testing. Supplemental testing consistently found additional breast cancers not identified by mammography, but increased false positive results, with the possible exception of DBT. The two good-quality studies of HHUS had consistent estimates of the incremental (additional after mammography) cancer detection rate: 4.4 per 1,000 exams. In the good-quality U.S. study, recall rates for additional imaging and/or biopsies were 139 per 1,000 exams; in the good-quality Italian study, the biopsy rate was 59 per 1,000 exams. In two fair-quality studies of ABUS, the cancer detection rates were 4.6 and 1.9 per 1,000 exams and recall was 87 and 150 per 1,000 exams. For MRI, incremental cancer detection rates ranged from 3.5 to 28.6 per 1,000 exams. Recall rates for additional diagnostic testing ranged from 115 to 235 per 1,000 exams. For DBT, cancer detection rates rose from 4.0 to 4.1 breast cancers per 1,000 exams with digital mammography alone to 5.4 to 6.6 breast cancers per 1,000 exams with added DBT. Recall rates declined with the addition of DBT in all studies: from 91 to 69 per 1,000 exams; from 72 to 66 per 1,000 exams, from 128 to 108 per 1,000 exams, and from 166 to 97 per 1,000 exams. Across all modalities, invasive cancers (rather than ductal carcinomas in-situ) comprised 89 to 93 percent of cancers detected by HHUS, 74 to 93 percent of cancers detected by ABUS, 67 to 86 percent of cancers detected by MRI, and 68 to 92 percent of those detected by DBT. We identified one RCT comparing potential harms of notification of breast density to a control group. No differences in psychological outcomes or intention for clinical breast exam were detected at 6 months. We found no studies on potential harms of receiving different breast density classification on sequential examinations. Harms of supplemental screening with ultrasound or MRI of women with dense breasts include higher recall and biopsy rates when compared with digital mammography alone. Harms of breast MRI include risk of nephrogenic systemic fibrosis for women with advanced chronic kidney disease. DBT use in conjunction with digital mammography more than doubles the radiation exposure of each combined screening exam.
Limitations: Studies of BI-RADS reproducibility may reflect somewhat older community practice. No studies examined long term outcomes of supplemental screening for women with dense breasts. Many studies of test performance and proximate clinical outcomes were of fair-quality and most were conducted in cohorts of women with risk factors in addition to dense breasts. Six observational studies compared cohorts with and without supplemental screening, but only one employed statistical techniques to adjust for differences in baseline risk between groups.
Conclusions: Reproducibility of BI-RADS density determinations in U.S. community practice does not appear to be ideal. Mammograms from 12.6 to 18.7 percent of women were reclassified into a different overall combined category (i.e., from “non-dense” to “dense” or vice versa) at their next screening exam when read by the same or a different radiologist, which may introduce confusion or reduce confidence among women receiving mandated breast density notifications. This would affect certainty of any recommendation for supplemental screening of women identified as having dense breasts. Studies identifying more accurate and reproducible methods of identifying women with dense breasts are needed. There were no published studies of important longer-term clinical outcomes of supplemental screening. In general, supplemental screening of women with dense breasts will lead to the identification of more breast cancers (mostly invasive), but may be associated with higher recall rates and additional biopsies. Whether cancers identified by supplemental screening have better outcomes and how many of them represent cancers that would not otherwise become clinically apparent (overdiagnosis) cannot be determined from the studies published to date. Rigorous comparative studies of supplemental screening for women with dense breasts including clinical outcomes beyond breast cancer diagnosis are needed for all modalities.
Digital Breast Tomosynthesis with Hologic 3D Mammography Selenia Dimensions System for Use in Breast Cancer Screening: A Single Technology Assessment [Internet].Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Sep 4. Report from the Norwegian Institute of Public Health No. 2017-08. Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH). 2017. PMID: 29553669 Free Books & Documents. Review.
Supplemental Screening for Breast Cancer in Women With Dense Breasts: A Systematic Review for the U.S. Preventive Services Task Force.Ann Intern Med. 2016 Feb 16;164(4):268-78. doi: 10.7326/M15-1789. Epub 2016 Jan 12. Ann Intern Med. 2016. PMID: 26757021 Free PMC article. Review.
Screening for Skin Cancer in Adults: An Updated Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Jul. Report No.: 14-05210-EF-1. Agency for Healthcare Research and Quality (US). 2016. PMID: 27583318 Free Books & Documents. Review.
Lipid Screening in Childhood for Detection of Multifactorial Dyslipidemia: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Aug. Report No.: 14-05204-EF-1. Agency for Healthcare Research and Quality (US). 2016. PMID: 27559550 Free Books & Documents. Review.
Lipid Screening in Childhood and Adolescence for Detection of Familial Hypercholesterolemia: A Systematic Evidence Review for the U.S. Preventive Services Task Force [Internet].Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Aug. Report No.: 14-05204-EF-2. Agency for Healthcare Research and Quality (US). 2016. PMID: 27559556 Free Books & Documents. Review.