Preprints promote the open and fast communication of non-peer reviewed work. Once a preprint is published in a peer-reviewed venue, the preprint server updates its web page: a prominent hyperlink leading to the newly published work is added. Linking preprints to publications is of utmost importance as it provides readers with the latest version of a now certified work. Yet leading preprint servers fail to identify all existing preprint-publication links. This limitation calls for a more thorough approach to this critical information retrieval task: overlooking published evidence translates into partial and even inaccurate systematic reviews on health-related issues, for instance. We designed an algorithm leveraging the Crossref public and free source of bibliographic metadata to comb the literature for preprint-publication links. We tested it on a reference preprint set identified and curated for a living systematic review on interventions for preventing and treating COVID-19 performed by international collaboration: the COVID-NMA initiative (covid-nma.com). The reference set comprised 343 preprints, 121 of which appeared as a publication in a peer-reviewed journal. While the preprint servers identified 39.7% of the preprint-publication links, our linker identified 90.9% of the expected links with no clues taken from the preprint servers. The accuracy of the proposed linker is 91.5% on this reference set, with 90.9% sensitivity and 91.9% specificity. This is a 16.26% increase in accuracy compared to that of preprint servers. We release this software as supplementary material to foster its integration into preprint servers' workflows and enhance a daily preprint-publication chase that is useful to all readers, including systematic reviewers. This preprint-publication linker currently provides day-to-day updates to the biomedical experts of the COVID-NMA initiative.
Keywords: COVID-19; Data linking; Living systematic review; Preprint; Publication.
© The Author(s) 2021.