Text Mining in Biomedical Domain with Emphasis on Document Clustering

Healthc Inform Res. 2017 Jul;23(3):141-146. doi: 10.4258/hir.2017.23.3.141. Epub 2017 Jul 31.

Abstract

Objectives: With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents.

Methods: This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain.

Results: Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail.

Conclusions: Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

Keywords: Classification; Cluster Analysis; Natural Language Processing; Software; Text Mining.

Publication types

  • Review