DISGROU: an algorithm for discontinuous subgroup discovery

PeerJ Comput Sci. 2021 Apr 27;7:e512. doi: 10.7717/peerj-cs.512. eCollection 2021.

Abstract

In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study.

Keywords: Data mining; Descriptive modeling; Knowledge discovery; Subgroup discovery.

Grant support

This work is supported by the ERA4CS project INNOVA (Grant Agreement number 690462). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.