Discrete model-based clustering with overlapping subsets of attributes

[edit]

Fernando Rodriguez-Sanchez, Pedro Larrañaga, Concha Bielza ;
Proceedings of the Ninth International Conference on Probabilistic Graphical Models, PMLR 72:392-403, 2018.

Abstract

Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.

Related Material