Discrete model-based clustering with overlapping subsets of attributes

Fernando Rodriguez-Sanchez, Pedro Larrañaga, Concha Bielza
; Proceedings of the Ninth International Conference on Probabilistic Graphical Models, PMLR 72:392-403, 2018.

Abstract

Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v72-rodriguez-sanchez18a, title = {Discrete model-based clustering with overlapping subsets of attributes}, author = {Rodriguez-Sanchez, Fernando and Larra\~{n}aga, Pedro and Bielza, Concha}, booktitle = {Proceedings of the Ninth International Conference on Probabilistic Graphical Models}, pages = {392--403}, year = {2018}, editor = {Václav Kratochvíl and Milan Studený}, volume = {72}, series = {Proceedings of Machine Learning Research}, address = {Prague, Czech Republic}, month = {11--14 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v72/rodriguez-sanchez18a/rodriguez-sanchez18a.pdf}, url = {http://proceedings.mlr.press/v72/rodriguez-sanchez18a.html}, abstract = {Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.} }
Endnote
%0 Conference Paper %T Discrete model-based clustering with overlapping subsets of attributes %A Fernando Rodriguez-Sanchez %A Pedro Larrañaga %A Concha Bielza %B Proceedings of the Ninth International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2018 %E Václav Kratochvíl %E Milan Studený %F pmlr-v72-rodriguez-sanchez18a %I PMLR %J Proceedings of Machine Learning Research %P 392--403 %U http://proceedings.mlr.press %V 72 %W PMLR %X Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.
APA
Rodriguez-Sanchez, F., Larrañaga, P. & Bielza, C.. (2018). Discrete model-based clustering with overlapping subsets of attributes. Proceedings of the Ninth International Conference on Probabilistic Graphical Models, in PMLR 72:392-403

Related Material