Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling

Maxime Rivest; Etienne Vignola-Gagné; Éric Archambault

doi:10.1371/journal.pone.0251493

Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling

PLoS One. 2021 May 11;16(5):e0251493. doi: 10.1371/journal.pone.0251493. eCollection 2021.

Authors

Maxime Rivest^{1

2}, Etienne Vignola-Gagné^{1

2}, Éric Archambault^{1

2

3}

Affiliations

¹ Science-Metrix Inc., Montréal, Québec, Canada.
² Elsevier B.V., Amsterdam, Netherlands.
³ 1science, Montréal, Québec, Canada.

Abstract

Classification schemes for scientific activity and publications underpin a large swath of research evaluation practices at the organizational, governmental, and national levels. Several research classifications are currently in use, and they require continuous work as new classification techniques becomes available and as new research topics emerge. Convolutional neural networks, a subset of "deep learning" approaches, have recently offered novel and highly performant methods for classifying voluminous corpora of text. This article benchmarks a deep learning classification technique on more than 40 million scientific articles and on tens of thousands of scholarly journals. The comparison is performed against bibliographic coupling-, direct citation-, and manual-based classifications-the established and most widely used approaches in the field of bibliometrics, and by extension, in many science and innovation policy activities such as grant competition management. The results reveal that the performance of this first iteration of a deep learning approach is equivalent to the graph-based bibliometric approaches. All methods presented are also on par with manual classification. Somewhat surprisingly, no machine learning approaches were found to clearly outperform the simple label propagation approach that is direct citation. In conclusion, deep learning is promising because it performed just as well as the other approaches but has more flexibility to be further improved. For example, a deep neural network incorporating information from the citation network is likely to hold the key to an even better classification algorithm.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Bibliographies as Topic
Bibliometrics*
Databases, Bibliographic
Deep Learning*
Publications / classification*
Scholarly Communication / statistics & numerical data
Science*

Grants and funding

The funder, Elsevier BV and its daughter company Science-Metrix Inc., 1science, provided support in the form of salaries for authors MR, EVG, EA, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.