Graph-based methods for Author Name Disambiguation: a survey

PeerJ Comput Sci. 2023 Sep 11:9:e1536. doi: 10.7717/peerj-cs.1536. eCollection 2023.

Abstract

Scholarly knowledge graphs (SKG) are knowledge graphs representing research-related information, powering discovery and statistics about research impact and trends. Author name disambiguation (AND) is required to produce high-quality SKGs, as a disambiguated set of authors is fundamental to ensure a coherent view of researchers' activity. Various issues, such as homonymy, scarcity of contextual information, and cardinality of the SKG, make simple name string matching insufficient or computationally complex. Many AND deep learning methods have been developed, and interesting surveys exist in the literature, comparing the approaches in terms of techniques, complexity, performance, etc. However, none of them specifically addresses AND methods in the context of SKGs, where the entity-relationship structure can be exploited. In this paper, we discuss recent graph-based methods for AND, define a framework through which such methods can be confronted, and catalog the most popular datasets and benchmarks used to test such methods. Finally, we outline possible directions for future work on this topic.

Keywords: Author name disambiguation; Deduplication; Disambiguation.

Grants and funding

This work was co-funded by the EU H2020 projects OpenAIRE-Nexus (Grant agreement ID: 101017452) and EOSC-Future (Grant agreement ID: 101017536). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.