Establishing semantic relatedness through ratings, reaction times, and semantic vectors: A database in Polish

Karolina Rataj; Patrycja Kakuba; Paweł Mandera; Walter J B van Heuven

doi:10.1371/journal.pone.0284801

Establishing semantic relatedness through ratings, reaction times, and semantic vectors: A database in Polish

PLoS One. 2023 Apr 24;18(4):e0284801. doi: 10.1371/journal.pone.0284801. eCollection 2023.

Authors

Karolina Rataj¹, Patrycja Kakuba², Paweł Mandera³, Walter J B van Heuven⁴

Affiliations

¹ Faculty of English, Neuroscience of Language Laboratory, Adam Mickiewicz University, Poznań, Poland.
² Faculty of English, Department of Psycholinguistic Studies, Adam Mickiewicz University, Poznań, Poland.
³ Lingvist Technologies, Tallinn, Estonia.
⁴ School of Psychology, University of Nottingham, Nottingham, United Kingdom.

Abstract

This study presents a Polish semantic priming dataset and semantic similarity ratings for word pairs obtained with native Polish speakers, as well as a range of semantic spaces. The word pairs include strongly related, weakly related, and semantically unrelated word pairs. The rating study (Experiment 1) confirmed that the three conditions differed in semantic relatedness. The semantic priming lexical decision study with a carefully matched subset of the stimuli (Experiment 2), revealed strong semantic priming effects for strongly related word pairs, whereas weakly related word pairs showed a smaller but still significant priming effect relative to semantically unrelated word pairs. The datasets of both experiments and those of SimLex-999 for Polish were then used in a robust semantic model selection from existing and newly trained semantic spaces. This database of semantic vectors, semantic relatedness ratings, and behavioral data collected for all word pairs enable future researchers to benchmark new vectors against this dataset. Furthermore, the new vectors are made freely available for researchers. Although similar semantically strongly and weakly related word pairs are available in other languages, this is the first freely available database for Polish, that combines measures of semantic distance and human data.

Copyright: © 2023 Rataj et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Language*
Poland
Reaction Time
Semantics*

Grants and funding

This research was supported by the National Science Center (grant no. UMO-2017/25/B/HS6/00676). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.