On hyperparameter tuning in general clustering problemsm

Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2996-3007, 2020.

Abstract

Tuning hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, the success of most clustering methods depends heavily on the correct choice of the involved hyperparameters. Take for example the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations of community detection in networks, or the bandwidth parameter needed in the Gaussian kernel used to construct similarity matrices for spectral clustering. Despite the popularity of these clustering algorithms, there are not many provable methods for tuning these hyperparameters. In this paper, we provide an overarching framework with provable guarantees for tuning hyperparameters in the above class of problems under two different models. Our framework can be augmented with a cross validation procedure to do model selection as well. In a variety of simulation and real data experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-fan20b, title = {On hyperparameter tuning in general clustering problemsm}, author = {Fan, Xinjie and Yue, Yuguang and Sarkar, Purnamrita and Wang, Y. X. Rachel}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2996--3007}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/fan20b/fan20b.pdf}, url = { http://proceedings.mlr.press/v119/fan20b.html }, abstract = {Tuning hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, the success of most clustering methods depends heavily on the correct choice of the involved hyperparameters. Take for example the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations of community detection in networks, or the bandwidth parameter needed in the Gaussian kernel used to construct similarity matrices for spectral clustering. Despite the popularity of these clustering algorithms, there are not many provable methods for tuning these hyperparameters. In this paper, we provide an overarching framework with provable guarantees for tuning hyperparameters in the above class of problems under two different models. Our framework can be augmented with a cross validation procedure to do model selection as well. In a variety of simulation and real data experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.} }
Endnote
%0 Conference Paper %T On hyperparameter tuning in general clustering problemsm %A Xinjie Fan %A Yuguang Yue %A Purnamrita Sarkar %A Y. X. Rachel Wang %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-fan20b %I PMLR %P 2996--3007 %U http://proceedings.mlr.press/v119/fan20b.html %V 119 %X Tuning hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, the success of most clustering methods depends heavily on the correct choice of the involved hyperparameters. Take for example the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations of community detection in networks, or the bandwidth parameter needed in the Gaussian kernel used to construct similarity matrices for spectral clustering. Despite the popularity of these clustering algorithms, there are not many provable methods for tuning these hyperparameters. In this paper, we provide an overarching framework with provable guarantees for tuning hyperparameters in the above class of problems under two different models. Our framework can be augmented with a cross validation procedure to do model selection as well. In a variety of simulation and real data experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.
APA
Fan, X., Yue, Y., Sarkar, P. & Wang, Y.X.R.. (2020). On hyperparameter tuning in general clustering problemsm. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2996-3007 Available from http://proceedings.mlr.press/v119/fan20b.html .

Related Material