Cluster Explanation via Polyhedral Descriptions

Connor Lawless, Oktay Gunluk
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:18652-18666, 2023.

Abstract

This paper focuses on the cluster description problem where, given a dataset and its partition into clusters, the task is to explain the clusters. We introduce a new approach to explain clusters by constructing a polyhedron around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description. We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. To deal with large datasets, we introduce a novel grouping scheme that first forms smaller groups of data points and then builds the polyhedra around the grouped data, a strategy which out-performs the common approach of sub-sampling data. Compared to state of the art cluster description algorithms, our approach is able to achieve competitive interpretability with improved description accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lawless23a, title = {Cluster Explanation via Polyhedral Descriptions}, author = {Lawless, Connor and Gunluk, Oktay}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {18652--18666}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lawless23a/lawless23a.pdf}, url = {https://proceedings.mlr.press/v202/lawless23a.html}, abstract = {This paper focuses on the cluster description problem where, given a dataset and its partition into clusters, the task is to explain the clusters. We introduce a new approach to explain clusters by constructing a polyhedron around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description. We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. To deal with large datasets, we introduce a novel grouping scheme that first forms smaller groups of data points and then builds the polyhedra around the grouped data, a strategy which out-performs the common approach of sub-sampling data. Compared to state of the art cluster description algorithms, our approach is able to achieve competitive interpretability with improved description accuracy.} }
Endnote
%0 Conference Paper %T Cluster Explanation via Polyhedral Descriptions %A Connor Lawless %A Oktay Gunluk %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lawless23a %I PMLR %P 18652--18666 %U https://proceedings.mlr.press/v202/lawless23a.html %V 202 %X This paper focuses on the cluster description problem where, given a dataset and its partition into clusters, the task is to explain the clusters. We introduce a new approach to explain clusters by constructing a polyhedron around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description. We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. To deal with large datasets, we introduce a novel grouping scheme that first forms smaller groups of data points and then builds the polyhedra around the grouped data, a strategy which out-performs the common approach of sub-sampling data. Compared to state of the art cluster description algorithms, our approach is able to achieve competitive interpretability with improved description accuracy.
APA
Lawless, C. & Gunluk, O.. (2023). Cluster Explanation via Polyhedral Descriptions. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:18652-18666 Available from https://proceedings.mlr.press/v202/lawless23a.html.

Related Material