Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasets

J Appl Stat. 2021 Dec 7;50(4):1017-1035. doi: 10.1080/02664763.2021.2008882. eCollection 2023.

Abstract

The clustering approach is widely accepted as the most prominent unsupervised learning problem in data mining techniques. This procedure deals with the identification of notable structures in unlabeled datasets. In modern days clustering of dynamic data, streams play a vital role in policy-making, and researchers are paying particular attention to monitoring the evolution of clustering solutions over time. The data streams evolve continually, and different sources generate data items over time. The clustering solution over this stream is not stationary and changes with the influx of new data items. This paper presents a comprehensive study of algorithms related to tracing the evolution of clusters over time in cumulative datasets. To demonstrate the applications and significance of the tracing cluster evolution, we implement the MONIC algorithm in R-software. This article illustrates how the data segmentation of dynamic streams is done and shows the applications of monitoring changes in clustering solutions with the help of real-life published datasets.

Keywords: Clustering; R; cumulative datasets; monitoring changes; transition.