Background: Real-world big data studies using health insurance claims databases require extraction algorithms to accurately identify target population and outcome. However, no algorithm for Crohn's disease (CD) has yet been validated. In this study we aim to develop an algorithm for identifying CD using the claims data of the insurance system.
Methods: A single-center retrospective study to develop a CD extraction algorithm from insurance claims data was conducted. Patients visiting the Kitasato University Kitasato Institute Hospital between January 2015-February 2019 were enrolled, and data were extracted according to inclusion criteria combining the Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10) diagnosis codes with or without prescription or surgical codes. Hundred cases that met each inclusion criterion were randomly sampled and positive predictive values (PPVs) were calculated according to the diagnosis in the medical chart. Of all cases, 20% were reviewed in duplicate, and the inter-observer agreement (Kappa) was also calculated.
Results: From the 82,898 enrolled, 255 cases were extracted by diagnosis code alone, 197 by the combination of diagnosis and prescription codes, and 197 by the combination of diagnosis codes and prescription or surgical codes. The PPV for confirmed CD cases was 83% by diagnosis codes alone, but improved to 97% by combining with prescription codes. The inter-observer agreement was 0.9903.
Conclusions: Single ICD-code alone was insufficient to define CD; however, the algorithm that combined diagnosis codes with prescription codes indicated a sufficiently high PPV and will enable outcome-based research on CD using the Japanese claims database.