The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Rafael Barbosa, Alina Ene, Huy Nguyen, Justin Ward
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1236-1244, 2015.

Abstract

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-barbosa15, title = {The Power of Randomization: Distributed Submodular Maximization on Massive Datasets}, author = {Barbosa, Rafael and Ene, Alina and Nguyen, Huy and Ward, Justin}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {1236--1244}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/barbosa15.pdf}, url = { http://proceedings.mlr.press/v37/barbosa15.html }, abstract = {A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.} }
Endnote
%0 Conference Paper %T The Power of Randomization: Distributed Submodular Maximization on Massive Datasets %A Rafael Barbosa %A Alina Ene %A Huy Nguyen %A Justin Ward %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-barbosa15 %I PMLR %P 1236--1244 %U http://proceedings.mlr.press/v37/barbosa15.html %V 37 %X A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.
RIS
TY - CPAPER TI - The Power of Randomization: Distributed Submodular Maximization on Massive Datasets AU - Rafael Barbosa AU - Alina Ene AU - Huy Nguyen AU - Justin Ward BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-barbosa15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 1236 EP - 1244 L1 - http://proceedings.mlr.press/v37/barbosa15.pdf UR - http://proceedings.mlr.press/v37/barbosa15.html AB - A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting. ER -
APA
Barbosa, R., Ene, A., Nguyen, H. & Ward, J.. (2015). The Power of Randomization: Distributed Submodular Maximization on Massive Datasets. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1236-1244 Available from http://proceedings.mlr.press/v37/barbosa15.html .

Related Material