Purpose: This study examined the accuracy of claims-based algorithms to identify smoking against self-reported smoking data.
Methods: Medicare patients enrolled in the Brigham and Women's Hospital Rheumatoid Arthritis Sequential Study were identified. For each patient, self-reported smoking status was extracted from Women's Hospital Rheumatoid Arthritis Sequential Study and the date of this measurement was defined as the index-date. Two algorithms identified smoking in Medicare claims: (i) only using diagnoses and procedure codes and (ii) using anti-smoking prescriptions in addition to diagnoses and procedure codes. Both algorithms were implemented: first, only using 365-days pre-index claims and then using all available pre-index claims. Considering self-reported smoking status as the gold standard, we calculated specificity, sensitivity, positive predictive value, negative predictive value (NPV), and area under the curve (AUC).
Results: A total of 128 patients were included in this study, of which 48% reported smoking. The algorithm only using diagnosis and procedure codes had the lowest sensitivity (9.8%, 95%CI 2.4%-17.3%), NPV (54.9%, 95%CI 46.1%-63.9%), and AUC (0.55, 95%CI 0.51-0.59) when applied in the period of 365 days pre-index. Incorporating pharmacy claims and using all available pre-index information improved the sensitivity (27.9%, 95%CI 16.6%-39.1%), NPV (60.4%, 95%CI 51.3%-69.5%), and AUC (0.64, 95%CI 0.58-0.70). The specificity and positive predictive value was 100% for all the algorithms tested.
Conclusion: Claims-based algorithms can identify smokers with limited sensitivity but very high specificity. In the absence of other reliable means, use of a claims-based algorithm to identify smoking could be cautiously considered in observational studies.
Keywords: claims-based algorithm; pharmacoepidemiology; smoking; validation.
Copyright © 2016 John Wiley & Sons, Ltd.