Discrete model-based clustering with overlapping subsets of attributes

Fernando Rodriguez-Sanchez; Pedro Larrañaga; Concha Bielza

Discrete model-based clustering with overlapping subsets of attributes

Fernando Rodriguez-Sanchez, Pedro Larrañaga, Concha Bielza

Proceedings of the Ninth International Conference on Probabilistic Graphical Models, PMLR 72:392-403, 2018.

Abstract

Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.

Cite this Paper

BibTeX

@InProceedings{pmlr-v72-rodriguez-sanchez18a,
  title = 	 {Discrete model-based clustering with overlapping subsets of attributes},
  author =       {Rodriguez-Sanchez, Fernando and Larra\~{n}aga, Pedro and Bielza, Concha},
  booktitle = 	 {Proceedings of the Ninth International Conference on Probabilistic Graphical Models},
  pages = 	 {392--403},
  year = 	 {2018},
  editor = 	 {Kratochvíl, Václav and Studený, Milan},
  volume = 	 {72},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v72/rodriguez-sanchez18a/rodriguez-sanchez18a.pdf},
  url = 	 {https://proceedings.mlr.press/v72/rodriguez-sanchez18a.html},
  abstract = 	 {Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.}
}

Endnote

%0 Conference Paper
%T Discrete model-based clustering with overlapping subsets of attributes
%A Fernando Rodriguez-Sanchez
%A Pedro Larrañaga
%A Concha Bielza
%B Proceedings of the Ninth International Conference on Probabilistic Graphical Models
%C Proceedings of Machine Learning Research
%D 2018
%E Václav Kratochvíl
%E Milan Studený	
%F pmlr-v72-rodriguez-sanchez18a
%I PMLR
%P 392--403
%U https://proceedings.mlr.press/v72/rodriguez-sanchez18a.html
%V 72
%X Traditional model-based clustering methods assume that data instances can be grouped in a single “best" way. This is often untrue for complex data, where several meaningful sets of clusters may exist, each of them associated to a unique subset of data attributes. Current literature has approached this problem with models that consider disjoint subsets of attributes to define distinct clustering solutions. Each solution being represented by a cluster variable. However, restricting attributes to a single cluster variable diminishes the expressiveness and quality of these models. For this reason, we propose a novel kind of models that allows cluster variables to have overlapping subsets of attributes between them. In order to learn these models, we propose to combine a search-based method with an attribute clustering procedure. Experimental results with both synthetic and real-world data show the utility of our approach and its competitiveness with the state-of-the-art.

APA

Rodriguez-Sanchez, F., Larrañaga, P. & Bielza, C.. (2018). Discrete model-based clustering with overlapping subsets of attributes. Proceedings of the Ninth International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 72:392-403 Available from https://proceedings.mlr.press/v72/rodriguez-sanchez18a.html.

Related Material

Download PDF