Absolute Cluster Validity

IEEE Trans Pattern Anal Mach Intell. 2020 Sep;42(9):2096-2112. doi: 10.1109/TPAMI.2019.2912970. Epub 2019 Apr 23.


The application of clustering involves the interpretation of objects placed in multi-dimensional spaces. The task of clustering itself is inherently submitted to subjectivity, the optimal solution can be extremely costly to discover and sometimes even unreachable or nonexistent. This fact introduces a trade-off between accuracy and computational effort, moreover given that engineering applications usually work well with suboptimal solutions. In such applied scenarios, cluster validation is mandatory to refine algorithms and ensure that solutions are meaningful. Validity indices are commonly intended to benchmark diverse clustering setups, therefore they are coefficients with a relative nature, i.e., useful when compared to one another. In this paper, we propose a validation methodology that enables absolute evaluations of clustering results. Our method performs geometric measurements of the solution space and provides a coherent interpretation of the data structure by using indices based on inter- and intra-cluster distances, density, and multimodality within clusters. Conducted tests and comparisons with well-known indices show that our validation methodology improves the robustness of the clustering application for knowledge discovery. While clustering is often performed as a black box technique, our index is construable and therefore allows for the implementation of systems enriched with self-checking capabilities.