Ontology based mining of pathogen-disease associations from literature

J Biomed Semantics. 2019 Sep 18;10(1):15. doi: 10.1186/s13326-019-0208-2.

Abstract

Background: Infectious diseases claim millions of lives especially in the developing countries each year. Identification of causative pathogens accurately and rapidly plays a key role in the success of treatment. To support infectious disease research and mechanisms of infection, there is a need for an open resource on pathogen-disease associations that can be utilized in computational studies. A large number of pathogen-disease associations is available from the literature in unstructured form and we need automated methods to extract the data.

Results: We developed a text mining system designed for extracting pathogen-disease relations from literature. Our approach utilizes background knowledge from an ontology and statistical methods for extracting associations between pathogens and diseases. In total, we extracted a total of 3420 pathogen-disease associations from literature. We integrated our literature-derived associations into a database which links pathogens to their phenotypes for supporting infectious disease research.

Conclusions: To the best of our knowledge, we present the first study focusing on extracting pathogen-disease associations from publications. We believe the text mined data can be utilized as a valuable resource for infectious disease research. All the data is publicly available from https://github.com/bio-ontology-research-group/padimi and through a public SPARQL endpoint from http://patho.phenomebrowser.net/ .

Keywords: Infectious disease; Pathogen; Pathogen–disease association; Relationship extraction; Text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Ontologies*
  • Communicable Diseases*
  • Data Mining / methods*
  • Internet