One-shot Coresets: The Case of k-Clustering

Olivier Bachem, Mario Lucic, Silvio Lattanzi
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:784-792, 2018.

Abstract

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-bachem18a, title = {One-shot Coresets: The Case of k-Clustering}, author = {Bachem, Olivier and Lucic, Mario and Lattanzi, Silvio}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {784--792}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/bachem18a/bachem18a.pdf}, url = {https://proceedings.mlr.press/v84/bachem18a.html}, abstract = {Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.} }
Endnote
%0 Conference Paper %T One-shot Coresets: The Case of k-Clustering %A Olivier Bachem %A Mario Lucic %A Silvio Lattanzi %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-bachem18a %I PMLR %P 784--792 %U https://proceedings.mlr.press/v84/bachem18a.html %V 84 %X Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.
APA
Bachem, O., Lucic, M. & Lattanzi, S.. (2018). One-shot Coresets: The Case of k-Clustering. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:784-792 Available from https://proceedings.mlr.press/v84/bachem18a.html.

Related Material