Background: Trial registries were established to combat publication bias by creating a comprehensive and unambiguous record of initiated clinical trials. However, the proliferation of registries and registration policies means that a single trial may be registered multiple times (i.e., "duplicates"). Because unidentified duplicates threaten our ability to identify trials unambiguously, we investigate to what degree duplicates have been identified across registries globally.
Methods: We retrieved all records from the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) search portal and made a list of all records identified as duplicates by the ICTRP. To investigate how to discriminate duplicates from non-duplicates, we applied text-based similarity scoring to various registration fields of both ICTRP-identified duplicates and arbitrary pairs of trials. We then used the best similarity measure to identify the most similar pairs of records and manually assessed a random sample of pairs not identified as duplicates by the ICTRP to estimate the number of previously unidentified (or "hidden") duplicates.
Results: Two hundred eighty-five thousand unique records, or 271 thousand unique trials after accounting for known duplicates, were retrieved from the ICTRP portal in April 2015. We found that the title field best discriminated duplicates from non-duplicates. Out of 41 billion total pair-wise comparisons, we identified the 474,000 pairs of titles with the highest similarity scores (>0.5). After manually assessing a random sample of 434 pairs, we estimated that 45 % of all duplicate registrations currently go undetected and remain to be identified and confirmed as duplicates. Thus, the actual number of unique trials represented in this dataset is estimated to be approximately 258,000 (5 % less).
Conclusions: The ICTRP portal does not currently enable the unambiguous identification of trials across registries. Further research is needed to identify and verify the duplicates that currently go undetected. Sponsors, registries, and the ICTRP should consider actions to ensure duplicate registrations are easily identifiable.
Keywords: Clinical trials; Duplicate registrations; Trial registration.