Biomedical knowledge navigation by literature clustering

J Biomed Inform. 2007 Apr;40(2):114-30. doi: 10.1016/j.jbi.2006.07.004. Epub 2006 Aug 5.


There is an urgent need for a system that facilitates surveys by biomedical researchers and the subsequent formulation of hypotheses based on the knowledge stored in literature. One approach is to cluster papers discussing a topic of interest and reveal its sub-topics that allow researchers to acquire an overview of the topic. We developed such a system called McSyBi. It accepts a set of citation data retrieved with PubMed and hierarchically and non-hierarchically clusters them based on the titles and the abstracts using statistical and natural language processing methods. A novel point is that McSyBi allows its users to change the clustering by entering a MeSH term or UMLS Semantic Type, and therefore they can see a set of citation data from multiple aspects. We evaluated McSyBi quantitatively and qualitatively: clustering of 27 sets of citation data (40643 different papers) and scrutiny of several resultant clusters. While non-hierarchical clustering provides us with an overview of the target topic, hierarchical clustering allows us to see more details and relationships among citation data. McSyBi is freely available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Biology / methods
  • Cluster Analysis*
  • Database Management Systems*
  • Information Storage and Retrieval / methods*
  • Medicine / methods
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods
  • Periodicals as Topic*
  • PubMed*