Clustering with bandit feedback: breaking down the computation/information gap

Victor Thuot; Alexandra Carpentier; Christophe Giraud; Nicolas Verzelen

Clustering with bandit feedback: breaking down the computation/information gap

Victor Thuot, Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:1221-1284, 2025.

Abstract

We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an

$N$ -armed stochastic bandit with

$d$ -dimensional subGaussian feedback. There exists a hidden partition of the arms into

$K$ groups, such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant

$\delta$ . In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-thuot25a,
  title = 	 {Clustering with bandit feedback: breaking down the computation/information gap},
  author =       {Thuot, Victor and Carpentier, Alexandra and Giraud, Christophe and Verzelen, Nicolas},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {1221--1284},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/thuot25a/thuot25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/thuot25a.html},
  abstract = 	 { We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant $\delta$. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes.  We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.}
}

Endnote

%0 Conference Paper
%T Clustering with bandit feedback: breaking down the computation/information gap
%A Victor Thuot
%A Alexandra Carpentier
%A Christophe Giraud
%A Nicolas Verzelen
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-thuot25a
%I PMLR
%P 1221--1284
%U https://proceedings.mlr.press/v272/thuot25a.html
%V 272
%X  We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant $\delta$. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes.  We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.

APA

Thuot, V., Carpentier, A., Giraud, C. & Verzelen, N.. (2025). Clustering with bandit feedback: breaking down the computation/information gap. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:1221-1284 Available from https://proceedings.mlr.press/v272/thuot25a.html.

Clustering with bandit feedback: breaking down the computation/information gap

Abstract

Cite this Paper

Related Material