Sequential no-Substitution k-Median-Clustering

Tom Hess, Sivan Sabato
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:962-972, 2020.

Abstract

We study the sample-based k-median clustering objective under a sequential setting without substitutions. In this setting, an i.i.d. sequence of examples is observed. An example can be selected as a center only immediately after it is observed, and it cannot be substituted later. The goal is to select a set of centers with a good k-median cost on the distribution which generated the sequence. We provide an efficient algorithm for this setting, and show that its multiplicative approximation factor is twice the approximation factor of an efficient offline algorithm. In addition, we show that if efficiency requirements are removed, there is an algorithm that can obtain the same approximation factor as the best offline algorithm. We demonstrate in experiments the performance of the efficient algorithm on real data sets. Our code is available at https://github.com/tomhess/No_Substitution_K_Median.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-hess20a, title = {Sequential no-Substitution k-Median-Clustering}, author = {Hess, Tom and Sabato, Sivan}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {962--972}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/hess20a/hess20a.pdf}, url = {https://proceedings.mlr.press/v108/hess20a.html}, abstract = {We study the sample-based k-median clustering objective under a sequential setting without substitutions. In this setting, an i.i.d. sequence of examples is observed. An example can be selected as a center only immediately after it is observed, and it cannot be substituted later. The goal is to select a set of centers with a good k-median cost on the distribution which generated the sequence. We provide an efficient algorithm for this setting, and show that its multiplicative approximation factor is twice the approximation factor of an efficient offline algorithm. In addition, we show that if efficiency requirements are removed, there is an algorithm that can obtain the same approximation factor as the best offline algorithm. We demonstrate in experiments the performance of the efficient algorithm on real data sets. Our code is available at https://github.com/tomhess/No_Substitution_K_Median.} }
Endnote
%0 Conference Paper %T Sequential no-Substitution k-Median-Clustering %A Tom Hess %A Sivan Sabato %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-hess20a %I PMLR %P 962--972 %U https://proceedings.mlr.press/v108/hess20a.html %V 108 %X We study the sample-based k-median clustering objective under a sequential setting without substitutions. In this setting, an i.i.d. sequence of examples is observed. An example can be selected as a center only immediately after it is observed, and it cannot be substituted later. The goal is to select a set of centers with a good k-median cost on the distribution which generated the sequence. We provide an efficient algorithm for this setting, and show that its multiplicative approximation factor is twice the approximation factor of an efficient offline algorithm. In addition, we show that if efficiency requirements are removed, there is an algorithm that can obtain the same approximation factor as the best offline algorithm. We demonstrate in experiments the performance of the efficient algorithm on real data sets. Our code is available at https://github.com/tomhess/No_Substitution_K_Median.
APA
Hess, T. & Sabato, S.. (2020). Sequential no-Substitution k-Median-Clustering. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:962-972 Available from https://proceedings.mlr.press/v108/hess20a.html.

Related Material