Constructing a Graph Database for Semantic Literature-Based Discovery

Stud Health Technol Inform. 2015:216:1094.

Abstract

Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

MeSH terms

  • Data Mining / methods*
  • Database Management Systems
  • Databases, Factual*
  • Machine Learning
  • Natural Language Processing*
  • Periodicals as Topic*
  • Semantics
  • Terminology as Topic*
  • Vocabulary, Controlled*