Lightweight predicate extraction for patient-level cancer information and ontology development

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):73. doi: 10.1186/s12911-017-0465-x.

Abstract

Background: Knowledge engineering for ontological knowledgebases is resource and time intensive. To alleviate these issues, especially for novices, automated tools from the natural language domain can assist in the development process of ontologies. We focus towards the development of ontologies for the public health domain and use patient-centric sources from MedlinePlus related to HPV-causing cancers.

Methods: This paper demonstrates the use of a lightweight open information extraction (OIE) tool to derive accurate knowledge triples that can lead to the seeding of an ontological knowledgebase. We developed a custom application, which interfaced with an information extraction software library, to help facilitate the tasks towards producing knowledge triples from textual sources.

Results: The results of our efforts generated accurate extractions ranging from 80-89% precision. These triples can later be transformed to OWL/RDF representation for our planned ontological knowledgebase.

Conclusions: OIE delivers an effective and accessible method towards the development ontologies.

Keywords: Natural language processing; Ontology learning; Open information extraction; Public health; Semi-automated ontology development.

MeSH terms

  • Biological Ontologies*
  • Humans
  • MedlinePlus*
  • Natural Language Processing*
  • Neoplasms*
  • Public Health*