Supervised approach to recognize question type in a QA system for health

Stud Health Technol Inform. 2008;136:407-12.


Many attempts have been made in the QA domain but no system applicable to the field of health is currently available on the Internet. This paper describes a bilingual French/English question answering system adapted to the health domain and more particularly the detection of the question's model. Indeed, the Question Analyzer module for identifying the question's model has a greater effect on the rest of the QA system. Our original hypothesis for the QA is that a question can be defined by two criteria: type of answer expected and medical type. These two must appear in the step of detection of the model in order to better define the type of question and thus, the corresponding answer. For this, questions were searched on the Internet and then given to experts in order to obtain classifications according to criteria such as type of question and type of medical context as mentioned above. In addition, tests of supervised and non-supervised classification were made to determine features of questions. The result of this first step was that algorithms of classification were chosen. The results obtained showed that categorizers giving the best results were the SVM. Currently, for a set of 100 questions, 84 are well categorized in English and 68 in French according to the type of answer expected. This figures fall to less than 50% for the medical type. Evaluations have showed that the system was good to identify the type of answer expected and could be enhanced for the medical type. It leads us to use an external source of knowledge: UMLS. A future improvement will be the usage of UMLS semantic network to better categorize a query according to the medical domain.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Computer Systems
  • Consumer Health Information*
  • Expert Systems
  • Humans
  • Information Storage and Retrieval*
  • Internet*
  • Knowledge Bases
  • Medical Informatics Computing*
  • Multilingualism
  • Natural Language Processing
  • Semantics
  • Vocabulary, Controlled