Rationale: Estimates of idiopathic pulmonary fibrosis (IPF) incidence and prevalence from electronic databases without case validation may be inaccurate.
Objectives: Develop claims algorithms to identify IPF and assess their positive predictive value (PPV) to estimate incidence and prevalence in the United States.
Methods: We developed three algorithms to identify IPF cases in the HealthCore Integrated Research Database. Sensitive and specific algorithms were developed based on literature review and consultation with clinical experts. PPVs were assessed using medical records. A third algorithm used logistic regression modeling to generate an IPF score and was validated using a separate set of medical records. We estimated incidence and prevalence of IPF using the sensitive algorithm corrected for the PPV.
Measurements and main results: We identified 4,598 patients using the sensitive algorithm and 2,052 patients using the specific algorithm. After medical record review, the PPVs of these algorithms using the treating clinician's diagnosis were 44.4 and 61.7%, respectively. For the IPF score, the PPV was 76.2%. Using the clinical adjudicator's diagnosis, the PPVs were 54 and 57.6%, respectively, and for the IPF score, the PPV was 83.3%. The incidence and period prevalences of IPF, corrected for the PPV, were 14.6 per 100,000 person-years and 58.7 per 100,000 persons, respectively.
Conclusions: Sensitive algorithms without correction for false positive errors overestimated incidence and prevalence of IPF. An IPF score offered the greatest PPV, but it requires further validation.
Keywords: epidemiology; idiopathic pulmonary fibrosis; interstitial lung disease; validation.