Biological information extraction and co-occurrence analysis

Methods Mol Biol. 2014:1159:77-92. doi: 10.1007/978-1-4939-0709-0_5.


Nowadays, it is possible to identify terms corresponding to biological entities within passages in biomedical text corpora: critically, their potential relationships then need to be detected. These relationships are typically detected by co-occurrence analysis, revealing associations between bioentities through their coexistence in single sentences and/or entire abstracts. These associations implicitly define networks, whose nodes represent terms/bioentities/concepts being connected by relationship edges; edge weights might represent confidence for these semantic connections.This chapter provides a review of current methods for co-occurrence analysis, focusing on data storage, analysis, and representation. We highlight scenarios of these approaches implemented by useful tools for information extraction and knowledge inference in the field of systems biology. We illustrate the practical utility of two online resources providing services of this type-namely, STRING and BioTextQuest-concluding with a discussion of current challenges and future perspectives in the field.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Bibliometrics*
  • Concept Formation*
  • Data Mining / methods*