Background: Chronic rhinosinusitis (CRS) is prevalent, morbid, and poorly understood. Extraction of electronic health record (EHR) data of patients with CRS may facilitate research on CRS. However, the accuracy of using structured billing codes for EHR-driven phenotyping of CRS is unknown. We sought to accurately identify CRS cases and controls using EHR data and to determine the accuracy of structured billing codes for identifying patients with CRS.
Methods: We developed and validated distinct algorithms to identify patients with CRS and controls using International Classification of Diseases, Ninth Revision (ICD-9) and Current Procedural Terminology codes. We used blinded clinician chart review as the reference standard to evaluate algorithm and billing code accuracy.
Results: Our initial control algorithm achieved a control positive predictive value (PPV) of 100% (i.e., negative predictive value of 100% for CRS). Our initial algorithm for CRS cases relied exclusively on billing codes and had a low case PPV (54%). Notably, ICD-9 code 471.x was associated with a case PPV of 85%, whereas the case PPV of ICD-9 code 473.x was only 34%. After multiple algorithm iterations, we increased the case PPV of our final algorithm to 91% by adding several requirements, e.g., that ICD-9 codes occur with 1 or more evaluations by a CRS specialist to enhance availability of objective clinical data for accurately phenotyping CRS.
Conclusion: These algorithms are an important first step to identify patients with CRS, and may facilitate EHR-based research on CRS pathogenesis, morbidity, and management. Exclusive use of coded data for phenotyping CRS has limited accuracy, especially because CRS symptomatology overlaps with that of other illnesses. Incorporating natural language processing (e.g., to evaluate results of nasal endoscopy or sinus computed tomography) into future work may increase algorithm accuracy and identify patients whose disease status may not be ascertained by only using billing codes.