The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Rafael Barbosa; Alina Ene; Huy Nguyen; Justin Ward

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Rafael Barbosa, Alina Ene, Huy Nguyen, Justin Ward

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1236-1244, 2015.

Abstract

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-barbosa15,
  title = 	 {The Power of Randomization: Distributed Submodular Maximization on Massive Datasets},
  author = 	 {Barbosa, Rafael and Ene, Alina and Nguyen, Huy and Ward, Justin},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1236--1244},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/barbosa15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/barbosa15.html},
  abstract = 	 {A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.}
}

Endnote

%0 Conference Paper
%T The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
%A Rafael Barbosa
%A Alina Ene
%A Huy Nguyen
%A Justin Ward
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-barbosa15
%I PMLR
%P 1236--1244
%U https://proceedings.mlr.press/v37/barbosa15.html
%V 37
%X A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.

RIS


TY  - CPAPER
TI  - The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
AU  - Rafael Barbosa
AU  - Alina Ene
AU  - Huy Nguyen
AU  - Justin Ward
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-barbosa15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1236
EP  - 1244
L1  - http://proceedings.mlr.press/v37/barbosa15.pdf
UR  - https://proceedings.mlr.press/v37/barbosa15.html
AB  - A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.
ER  -

APA


Barbosa, R., Ene, A., Nguyen, H. & Ward, J.. (2015). The Power of Randomization: Distributed Submodular Maximization on Massive Datasets. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1236-1244 Available from https://proceedings.mlr.press/v37/barbosa15.html.

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Abstract

Cite this Paper

Related Material