Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 20;10(3):e0117390.
doi: 10.1371/journal.pone.0117390. eCollection 2015.

Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension

Affiliations

Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension

Yuanchao Liu et al. PLoS One. .

Abstract

The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The extension at the word level.
Fig 2
Fig 2. The LDA model.
Fig 3
Fig 3. Selection of keywords by using lexicon chain technology (CNumber is the index of different lexicon chains).
Fig 4
Fig 4. An example for demonstrating the effect of semantic extension on document level.
Fig 5
Fig 5. Clustering performance for different granularity and different blend factor.
(A). Evaluation method: accuracy; Clustering method: k-means; Dataset: dataset 1; (B). Evaluation method: accuracy; Clustering method: SOM; Dataset: dataset 1; (C). Evaluation method: BF; Clustering method: k-means; Dataset: dataset 1; (D): Evaluation method: BF; Clustering method: SOM; Dataset: dataset 1; (E). Evaluation method: accuracy; Clustering method: k-means; Dataset: dataset 2; (F). Evaluation method: accuracy; Clustering method: SOM; Dataset: dataset 2; (G). Evaluation method: BF; Clustering method: k-means; Dataset: dataset 2; (H). Evaluation method: BF; Clustering method: SOM; Dataset: dataset 2.
Fig 6
Fig 6. The impact of feature selection on clustering results.
(A). Evaluation method: accuracy; Clustering method: k-means; Dataset: dataset 1; (B). Evaluation method: accuracy; Clustering method: SOM; Dataset: dataset 1; (C). Evaluation method: BF; Clustering method: k-means; Dataset: dataset 1; (D): Evaluation method: BF; Clustering method: SOM; Dataset: dataset 1; (E). Evaluation method: accuracy; Clustering method: k-means; Dataset: dataset 2; (F). Evaluation method: accuracy; Clustering method: SOM; Dataset: dataset 2; (G). Evaluation method: BF; Clustering method: k-means; Dataset: dataset 2; (H). Evaluation method: BF; Clustering method: SOM; Dataset: dataset 2.

Similar articles

References

    1. Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM-Self Organizing Maps of Document Collections. Neurocomputing, Vol 21, 1998:l0l–117.
    1. Chim H, Xiaotie D (2008) Efficient phrase-based document similarity for clustering. IEEE Transactions on Knowledge and Data Engineering, v 20, n 9, September, 2008:1217–1229.
    1. Guerrero R, Vincent P, Moya A, Victor H (2002) Document Organization using Kohonen's Algorithm. Information Processing and Management, Vol 38, No 1, 2002:79–89.
    1. Shan C, Damminda A, et al. (2005) Building an adaptive hierarchy of clusters for text data. International Conference on Computational Intelligence for Modeling, Control and Automation, 2005:7–12.
    1. Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing, vol. 21, no. 1–3, 1998: 61–77.

Grants and funding

The authors report no current funding sources for this study.

LinkOut - more resources