Information theoretic clustering via divergence maximization among clusters

Sahil Garg, Mina Dalirrooyfard, Anderson Schneider, Yeshaya Adler, Yuriy Nevmyvaka, Yu Chen, Fengpei Li, Guillermo Cecchi
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:624-634, 2023.

Abstract

Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. Such an approach, however, does not explicitly promote any type of inter-cluster behavior. We instead propose to maximize the Kullback-Leibler divergence between the underlying data distributions associated to clusters (referred to as cluster distributions). We show it to entail the mutual information criterion along with maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k-1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w.r.t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-garg23a, title = {Information theoretic clustering via divergence maximization among clusters}, author = {Garg, Sahil and Dalirrooyfard, Mina and Schneider, Anderson and Adler, Yeshaya and Nevmyvaka, Yuriy and Chen, Yu and Li, Fengpei and Cecchi, Guillermo}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {624--634}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/garg23a/garg23a.pdf}, url = {https://proceedings.mlr.press/v216/garg23a.html}, abstract = {Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. Such an approach, however, does not explicitly promote any type of inter-cluster behavior. We instead propose to maximize the Kullback-Leibler divergence between the underlying data distributions associated to clusters (referred to as cluster distributions). We show it to entail the mutual information criterion along with maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k-1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w.r.t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.} }
Endnote
%0 Conference Paper %T Information theoretic clustering via divergence maximization among clusters %A Sahil Garg %A Mina Dalirrooyfard %A Anderson Schneider %A Yeshaya Adler %A Yuriy Nevmyvaka %A Yu Chen %A Fengpei Li %A Guillermo Cecchi %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-garg23a %I PMLR %P 624--634 %U https://proceedings.mlr.press/v216/garg23a.html %V 216 %X Information-theoretic clustering is one of the most promising and principled approaches to finding clusters with minimal apriori assumptions. The key criterion therein is to maximize the mutual information between the data points and their cluster labels. Such an approach, however, does not explicitly promote any type of inter-cluster behavior. We instead propose to maximize the Kullback-Leibler divergence between the underlying data distributions associated to clusters (referred to as cluster distributions). We show it to entail the mutual information criterion along with maximizing cross entropy between the cluster distributions. For practical efficiency, we propose to empirically estimate the objective of KL-D between clusters in its dual form leveraging deep neural nets as a dual function approximator. Remarkably, our theoretical analysis establishes that estimating the divergence measure in its dual form simplifies the problem of clustering to one of optimally finding k-1 cut points for k clusters in the 1-D dual functional space. Overall, our approach enables linear-time clustering algorithms with theoretical guarantees of near-optimality, owing to the submodularity of the objective. We show the empirical superiority of our approach w.r.t. current state-of-the-art methods on the challenging task of clustering noisy timeseries as observed in domains such as neuroscience, healthcare, financial markets, spatio-temporal environmental dynamics, etc.
APA
Garg, S., Dalirrooyfard, M., Schneider, A., Adler, Y., Nevmyvaka, Y., Chen, Y., Li, F. & Cecchi, G.. (2023). Information theoretic clustering via divergence maximization among clusters. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:624-634 Available from https://proceedings.mlr.press/v216/garg23a.html.

Related Material