Multi-focus cluster labeling

J Biomed Inform. 2015 Jun;55:116-23. doi: 10.1016/j.jbi.2015.03.012. Epub 2015 Apr 11.

Abstract

Document collections resulting from searches in the biomedical literature, for instance, in PubMed, are often so large that some organization of the returned information is necessary. Clustering is an efficient tool for organizing search results. To help the user to decide how to continue the search for relevant documents, the content of each cluster can be characterized by a set of representative keywords or cluster labels. As different users may have different interests, it can be desirable with solutions that make it possible to produce labels from a selection of different topical categories. We therefore introduce the concept of multi-focus cluster labeling to give users the possibility to get an overview of the contents through labels from multiple viewpoints. The concept for multi-focus cluster labeling has been established and has been demonstrated on three different document collections. We illustrate that multi-focus visualizations can give an overview of clusters along axes that general labels are not able to convey. The approach is generic and should be applicable to any biomedical (or other) domain with any selection of foci where appropriate focus vocabularies can be established. A user evaluation also indicates that such a multi-focus concept is useful.

Keywords: Cluster labeling; Multi focus; Text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods*
  • Documentation / classification*
  • Documentation / statistics & numerical data
  • MEDLINE / classification*
  • MEDLINE / statistics & numerical data
  • Machine Learning
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods
  • User-Computer Interface*
  • Vocabulary, Controlled*