Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Maximilian Graf, Thuot Victor, Nicolas Verzelen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:20296-20325, 2025.

Abstract

We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least $1-\delta$, we obtain an accurate recovery of the partition and derive an upper bound on the required budget . Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-graf25a, title = {Clustering Items through Bandit Feedback: Finding the Right Feature out of Many}, author = {Graf, Maximilian and Victor, Thuot and Verzelen, Nicolas}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {20296--20325}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/graf25a/graf25a.pdf}, url = {https://proceedings.mlr.press/v267/graf25a.html}, abstract = {We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least $1-\delta$, we obtain an accurate recovery of the partition and derive an upper bound on the required budget . Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.} }
Endnote
%0 Conference Paper %T Clustering Items through Bandit Feedback: Finding the Right Feature out of Many %A Maximilian Graf %A Thuot Victor %A Nicolas Verzelen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-graf25a %I PMLR %P 20296--20325 %U https://proceedings.mlr.press/v267/graf25a.html %V 267 %X We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least $1-\delta$, we obtain an accurate recovery of the partition and derive an upper bound on the required budget . Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.
APA
Graf, M., Victor, T. & Verzelen, N.. (2025). Clustering Items through Bandit Feedback: Finding the Right Feature out of Many. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:20296-20325 Available from https://proceedings.mlr.press/v267/graf25a.html.

Related Material