Validation study in four health-care databases: upper gastrointestinal bleeding misclassification affects precision but not magnitude of drug-related upper gastrointestinal bleeding risk

J Clin Epidemiol. 2014 Aug;67(8):921-31. doi: 10.1016/j.jclinepi.2014.02.020. Epub 2014 May 1.


Objective: To evaluate the accuracy of disease codes and free text in identifying upper gastrointestinal bleeding (UGIB) from electronic health-care records (EHRs).

Study design and setting: We conducted a validation study in four European electronic health-care record (EHR) databases such as Integrated Primary Care Information (IPCI), Health Search/CSD Patient Database (HSD), ARS, and Aarhus, in which we identified UGIB cases using free text or disease codes: (1) International Classification of Disease (ICD)-9 (HSD, ARS); (2) ICD-10 (Aarhus); and (3) International Classification of Primary Care (ICPC) (IPCI). From each database, we randomly selected and manually reviewed 200 cases to calculate positive predictive values (PPVs). We employed different case definitions to assess the effect of outcome misclassification on estimation of risk of drug-related UGIB.

Results: PPV was 22% [95% confidence interval (CI): 16, 28] and 21% (95% CI: 16, 28) in IPCI for free text and ICPC codes, respectively. PPV was 91% (95% CI: 86, 95) for ICD-9 codes and 47% (95% CI: 35, 59) for free text in HSD. PPV for ICD-9 codes in ARS was 72% (95% CI: 65, 78) and 77% (95% CI: 69, 83) for ICD-10 codes (Aarhus). More specific definitions did not have significant impact on risk estimation of drug-related UGIB, except for wider CIs.

Conclusions: ICD-9-CM and ICD-10 disease codes have good PPV in identifying UGIB from EHR; less granular terminology (ICPC) may require additional strategies. Use of more specific UGIB definitions affects precision, but not magnitude, of risk estimates.

Keywords: Drug safety; Non-steroidal anti-inflammatory agents; Positive predictive value; Signal detection; Upper gastrointestinal bleeding; Validation study.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Algorithms
  • Confidence Intervals
  • Databases, Factual
  • Drug-Related Side Effects and Adverse Reactions / diagnosis
  • Drug-Related Side Effects and Adverse Reactions / epidemiology
  • Electronic Health Records*
  • Europe / epidemiology
  • Female
  • Gastrointestinal Hemorrhage / chemically induced*
  • Gastrointestinal Hemorrhage / classification*
  • Gastrointestinal Hemorrhage / diagnosis
  • Gastrointestinal Hemorrhage / epidemiology
  • Humans
  • International Classification of Diseases / standards*
  • Male
  • Middle Aged
  • Pharmacoepidemiology
  • Predictive Value of Tests
  • Reproducibility of Results
  • Risk Assessment