Embedding of Semantic Predications

J Biomed Inform. 2017 Apr;68:150-166. doi: 10.1016/j.jbi.2017.03.003. Epub 2017 Mar 8.

Abstract

This paper concerns the generation of distributed vector representations of biomedical concepts from structured knowledge, in the form of subject-relation-object triplets known as semantic predications. Specifically, we evaluate the extent to which a representational approach we have developed for this purpose previously, known as Predication-based Semantic Indexing (PSI), might benefit from insights gleaned from neural-probabilistic language models, which have enjoyed a surge in popularity in recent years as a means to generate distributed vector representations of terms from free text. To do so, we develop a novel neural-probabilistic approach to encoding predications, called Embedding of Semantic Predications (ESP), by adapting aspects of the Skipgram with Negative Sampling (SGNS) algorithm to this purpose. We compare ESP and PSI across a number of tasks including recovery of encoded information, estimation of semantic similarity and relatedness, and identification of potentially therapeutic and harmful relationships using both analogical retrieval and supervised learning. We find advantages for ESP in some, but not all of these tasks, revealing the contexts in which the additional computational work of neural-probabilistic modeling is justified.

Keywords: Distributional semantics; Literature-based discovery; Pharmacovigilance; Predication-based semantic indexing; Semantic predications; Word embeddings.

MeSH terms

  • Algorithms*
  • Humans
  • Natural Language Processing*
  • Semantics*