Constructing Semantic Models From Words, Images, and Emojis

Cogn Sci. 2020 Apr;44(4):e12830. doi: 10.1111/cogs.12830.

Abstract

A number of recent models of semantics combine linguistic information, derived from text corpora, and visual information, derived from image collections, demonstrating that the resulting multimodal models are better than either of their unimodal counterparts, in accounting for behavioral data. Empirical work on semantic processing has shown that emotion also plays an important role especially in abstract concepts; however, models integrating emotion along with linguistic and visual information are lacking. Here, we first improve on visual and affective representations, derived from state-of-the-art existing models, by choosing models that best fit available human semantic data and extending the number of concepts they cover. Crucially then, we assess whether adding affective representations (obtained from a neural network model designed to predict emojis from co-occurring text) improves the model's ability to fit semantic similarity/relatedness judgments from a purely linguistic and linguistic-visual model. We find that, given specific weights assigned to the models, adding both visual and affective representations improves performance, with visual representations providing an improvement especially for more concrete words, and affective representations improving especially the fit for more abstract words.

Keywords: Concreteness; Distributional models; Emotion; Language; Multimodal models; Similarity/relatedness; Vision.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Concept Formation*
  • Emotions
  • Humans
  • Models, Psychological*
  • Semantics*