Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions

AMIA Annu Symp Proc. 2017 Feb 10;2016:914-923. eCollection 2016.


Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context. In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases. In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia. In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not. We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD). The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance. Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content.

MeSH terms

  • Consumer Health Information*
  • Datasets as Topic
  • Humans
  • Information Seeking Behavior*
  • Information Storage and Retrieval / methods
  • Knowledge Bases*
  • Natural Language Processing
  • Semantics