Empirical distributional semantics: methods and biomedical applications

J Biomed Inform. 2009 Apr;42(2):390-405. doi: 10.1016/j.jbi.2009.02.002. Epub 2009 Feb 14.


Over the past 15 years, a range of methods have been developed that are able to learn human-like estimates of the semantic relatedness between terms from the way in which these terms are distributed in a corpus of unannotated natural language text. These methods have also been evaluated in a number of applications in the cognitive science, computational linguistics and the information retrieval literatures. In this paper, we review the available methodologies for derivation of semantic relatedness from free text, as well as their evaluation in a variety of biomedical and other applications. Recent methodological developments, and their applicability to several existing applications are also discussed.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Abstracting and Indexing
  • Computational Biology / methods*
  • Information Storage and Retrieval / methods*
  • Models, Statistical
  • Natural Language Processing
  • Neural Networks, Computer
  • Reproducibility of Results
  • Semantics*
  • Software
  • Vocabulary, Controlled