Objective: To develop and validate a clinically informed algorithm that uses solely Medicare claims to identify, with a high positive predictive value, incident breast cancer cases.
Data source: Population-based Surveillance, Epidemiology, and End Results (SEER) Tumor Registry data linked to Medicare claims, and Medicare claims from a 5 percent random sample of beneficiaries in SEER areas.
Study design: An algorithm was developed using claims from 1995 breast cancer patients from the SEER-Medicare database, as well as 1995 claims from Medicare control subjects. The algorithm was validated on claims from breast cancer subjects and controls from 1994. The algorithm development process used both clinical insight and logistic regression methods.
Data extraction: Training set: Claims from 7,700 SEER-Medicare breast cancer subjects diagnosed in 1995, and 124,884 controls. Validation set: Claims from 7,607 SEER-Medicare breast cancer subjects diagnosed in 1994, and 120,317 controls.
Principal findings: A four-step prediction algorithm was developed and validated. It has a positive predictive value of 89 to 93 percent, and a sensitivity of 80 percent for identifying incident breast cancer. The sensitivity is 82-87 percent for stage I or II, and lower for other stages. The sensitivity is 82-83 percent for women who underwent either breast-conserving surgery or mastectomy, and is similar across geographic sites. A cohort identified with this algorithm will have 89-93 percent incident breast cancer cases, 1.5-6 percent cancer-free cases, and 4-5 percent prevalent breast cancer cases.
Conclusions: This algorithm has better performance characteristics than previously proposed algorithms. The ability to examine national patterns of breast cancer care using Medicare claims data would open new avenues for the assessment of quality of care.