Background: Individual case reports of suspected harm from medicines are fundamental for signal detection in postmarketing surveillance. Their effective analysis requires reliable data and one challenge is report duplication. These are multiple unlinked records describing the same suspected adverse drug reaction (ADR) in a particular patient. They distort statistical screening and can mislead clinical assessment. Many organisations rely on rule-based detection, but probabilistic record matching is an alternative.
Objectives: The aim of this study was to evaluate probabilistic record matching for duplicate detection, and to characterise the main sources of duplicate reports within each data set.
Research design: vigiMatch™, a published probabilistic record matching algorithm, was applied to the WHO global individual case safety reports database, VigiBase(®), for reports submitted between 2000 and 2010. Reported drugs, ADRs, patient age, sex, country of origin, and date of onset were considered in the matching. Suspected duplicates for the UK, Denmark, and Spain were reviewed and classified by the respective national centre. This included evaluation to determine whether confirmed duplicates had already been identified by in-house, rule-based screening. Furthermore, each confirmed duplicate was classified with respect to the likely source of duplication.
Measures: For each country, the proportions of suspected duplicates classified as confirmed duplicates, likely duplicates, otherwise related, and unrelated were obtained. The proportions of confirmed or likely duplicates that were not previously known by the national organisation were determined, and variations in the rates of suspected duplicates across subsets of reports were characterised.
Results: Overall, 2.5 % of the reports with sufficient information to be evaluated by vigiMatch were classified as suspected duplicates. The rates for the three countries considered in this study were 1.4 % (UK), 1.0 % (Denmark), and 0.7 % (Spain). Higher rates of suspected duplicates were observed for literature reports (11 %) and reports with fatal outcome (5 %), whereas a lower rate was observed for reports from consumers and non-health professionals (0.5 %). The predictive value for confirmed or likely duplicates among reports flagged as suspected duplicates by vigiMatch ranged from 86 % for the UK, to 64 % for Denmark and 33 % for Spain. The proportions of confirmed duplicates that were previously unknown to national centres ranged from 89 % for Spain, to 60 % for the UK and 38 % for Denmark, despite in-house duplicate detection processes in routine use. The proportion of unrelated cases among suspected duplicates were below 10 % for each national centre in the study.
Conclusions: Probabilistic record matching, as implemented in vigiMatch, achieved good predictive value for confirmed or likely duplicates in each data source. Most of the false positives corresponded to otherwise related reports; less than 10 % were altogether unrelated. A substantial proportion of the correctly identified duplicates had not previously been detected by national centre activity. On one hand, vigiMatch highlighted duplicates that had been missed by rule-based methods, and on the other hand its lower total number of suspected duplicates to review improved the accuracy of manual review.