Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

Ashkan Norouzi-Fard; Jakub Tarnawski; Slobodan Mitrovic; Amir Zandieh; Aidasadat Mousavifar; Ola Svensson

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavifar, Ola Svensson

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3829-3838, 2018.

Abstract

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-norouzi-fard18a,
  title = 	 {Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams},
  author =       {Norouzi-Fard, Ashkan and Tarnawski, Jakub and Mitrovic, Slobodan and Zandieh, Amir and Mousavifar, Aidasadat and Svensson, Ola},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {3829--3838},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/norouzi-fard18a/norouzi-fard18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/norouzi-fard18a.html},
  abstract = 	 {Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.}
}

Endnote

%0 Conference Paper
%T Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
%A Ashkan Norouzi-Fard
%A Jakub Tarnawski
%A Slobodan Mitrovic
%A Amir Zandieh
%A Aidasadat Mousavifar
%A Ola Svensson
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-norouzi-fard18a
%I PMLR
%P 3829--3838
%U https://proceedings.mlr.press/v80/norouzi-fard18a.html
%V 80
%X Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

APA


Norouzi-Fard, A., Tarnawski, J., Mitrovic, S., Zandieh, A., Mousavifar, A. & Svensson, O.. (2018). Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3829-3838 Available from https://proceedings.mlr.press/v80/norouzi-fard18a.html.

Related Material

Download PDF