Patents are essential for transferring scientific discoveries to meaningful products that benefit societies. While the academic community focuses on the number of citations to rank scholarly works according to their "scientific merit," the number of citations is unrelated to the relevance for patentable innovation. To explore associations between patents and scholarly works in publicly available patent data, we propose to utilize statistical methods that are commonly used in biology to determine gene-disease associations. We illustrate their usage on patents related to biotechnological trends of high relevance for food safety and ecology, namely the CRISPR-based gene editing technology (>60,000 patents) and cyanobacterial biotechnology (>33,000 patents). Innovation trends are found through their unexpected large changes of patent numbers in a time-series analysis. From the total set of scholarly works referenced by all investigated patents (~254,000 publications), we identified ~1,000 scholarly works that are statistical significantly over-represented in the references of patents from changing innovation trends that concern immunology, agricultural plant genomics, and biotechnological engineering methods. The detected associations are consistent with the technical requirements of the respective innovations. In summary, the presented data-driven analysis workflow can identify scholarly works that were required for changes in innovation trends, and, therefore, is of interest for researches that would like to evaluate the relevance of publications beyond the number of citations.
Keywords: big data; biotechnology; literature associations; open data; patents.
Copyright © 2024 Geissler, Gorodkin and Seemann.