Leveraging Similarities in Multi-Armed Bandits

Khaled Eldowa, Thibaud Rahier, Augustin Cablant, Panayotis Mertikopoulos, Pierre Gaillard
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:2259-2306, 2026.

Abstract

In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-eldowa26a, title = {Leveraging Similarities in Multi-Armed Bandits}, author = {Eldowa, Khaled and Rahier, Thibaud and Cablant, Augustin and Mertikopoulos, Panayotis and Gaillard, Pierre}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {2259--2306}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/eldowa26a/eldowa26a.pdf}, url = {https://proceedings.mlr.press/v336/eldowa26a.html}, abstract = {In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.} }
Endnote
%0 Conference Paper %T Leveraging Similarities in Multi-Armed Bandits %A Khaled Eldowa %A Thibaud Rahier %A Augustin Cablant %A Panayotis Mertikopoulos %A Pierre Gaillard %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-eldowa26a %I PMLR %P 2259--2306 %U https://proceedings.mlr.press/v336/eldowa26a.html %V 336 %X In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.
APA
Eldowa, K., Rahier, T., Cablant, A., Mertikopoulos, P. & Gaillard, P.. (2026). Leveraging Similarities in Multi-Armed Bandits. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:2259-2306 Available from https://proceedings.mlr.press/v336/eldowa26a.html.

Related Material