Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation

Ian En-Hsu Yen, Dmitry Malioutov, Abhishek Kumar
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1260-1269, 2016.

Abstract

In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also well-known as facility location in the operations research literature. While the problem has well-developed convex relaxation with approximation and recovery guarantees, it has number of variables grows quadratically with the number of samples. Therefore, state-of-the-art methods can hardly handle more than 10^4 samples (i.e. 10^8 variables). In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10^6 samples. We also demonstrate successful applications of the algorithm on world-scale facility location, document summarization and active learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-yen16, title = {Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation}, author = {Yen, Ian En-Hsu and Malioutov, Dmitry and Kumar, Abhishek}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1260--1269}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/yen16.pdf}, url = {https://proceedings.mlr.press/v51/yen16.html}, abstract = {In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also well-known as facility location in the operations research literature. While the problem has well-developed convex relaxation with approximation and recovery guarantees, it has number of variables grows quadratically with the number of samples. Therefore, state-of-the-art methods can hardly handle more than 10^4 samples (i.e. 10^8 variables). In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10^6 samples. We also demonstrate successful applications of the algorithm on world-scale facility location, document summarization and active learning.} }
Endnote
%0 Conference Paper %T Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation %A Ian En-Hsu Yen %A Dmitry Malioutov %A Abhishek Kumar %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-yen16 %I PMLR %P 1260--1269 %U https://proceedings.mlr.press/v51/yen16.html %V 51 %X In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also well-known as facility location in the operations research literature. While the problem has well-developed convex relaxation with approximation and recovery guarantees, it has number of variables grows quadratically with the number of samples. Therefore, state-of-the-art methods can hardly handle more than 10^4 samples (i.e. 10^8 variables). In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10^6 samples. We also demonstrate successful applications of the algorithm on world-scale facility location, document summarization and active learning.
RIS
TY - CPAPER TI - Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation AU - Ian En-Hsu Yen AU - Dmitry Malioutov AU - Abhishek Kumar BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-yen16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1260 EP - 1269 L1 - http://proceedings.mlr.press/v51/yen16.pdf UR - https://proceedings.mlr.press/v51/yen16.html AB - In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also well-known as facility location in the operations research literature. While the problem has well-developed convex relaxation with approximation and recovery guarantees, it has number of variables grows quadratically with the number of samples. Therefore, state-of-the-art methods can hardly handle more than 10^4 samples (i.e. 10^8 variables). In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10^6 samples. We also demonstrate successful applications of the algorithm on world-scale facility location, document summarization and active learning. ER -
APA
Yen, I.E., Malioutov, D. & Kumar, A.. (2016). Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1260-1269 Available from https://proceedings.mlr.press/v51/yen16.html.

Related Material