NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing

Joseph Mariani; Gil Francopoulo; Patrick Paroubek; Frédéric Vernier

doi:10.3389/frma.2022.863126

NLP4NLP+5: The Deep (R)evolution in Speech and Language Processing

Front Res Metr Anal. 2022 Jul 27:7:863126. doi: 10.3389/frma.2022.863126. eCollection 2022.

Authors

Joseph Mariani¹, Gil Francopoulo², Patrick Paroubek¹, Frédéric Vernier¹

Affiliations

¹ Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay, France.
² Tagmatica, Paris, France.

Abstract

This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016-2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965-2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965-1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%). These analyses are conducted globally or comparatively among sources and also with the general scientific literature, with a focus on the past 5 years. It concludes in identifying profound changes in research topics as well as in the emergence of a new generation of authors and the appearance of new publications around artificial intelligence, neural networks, machine learning, and word embedding.

Keywords: artificial intelligence; machine learning; natural language processing; neural networks; research metrics; speech processing; text mining.