Leveraging Similarities in Multi-Armed Bandits

Khaled Eldowa; Thibaud Rahier; Augustin Cablant; Panayotis Mertikopoulos; Pierre Gaillard

Leveraging Similarities in Multi-Armed Bandits

Khaled Eldowa, Thibaud Rahier, Augustin Cablant, Panayotis Mertikopoulos, Pierre Gaillard

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:2259-2306, 2026.

Abstract

In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-eldowa26a,
  title = 	 {Leveraging Similarities in Multi-Armed Bandits},
  author =       {Eldowa, Khaled and Rahier, Thibaud and Cablant, Augustin and Mertikopoulos, Panayotis and Gaillard, Pierre},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {2259--2306},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/eldowa26a/eldowa26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/eldowa26a.html},
  abstract = 	 {In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.}
}

Endnote

%0 Conference Paper
%T Leveraging Similarities in Multi-Armed Bandits
%A Khaled Eldowa
%A Thibaud Rahier
%A Augustin Cablant
%A Panayotis Mertikopoulos
%A Pierre Gaillard
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-eldowa26a
%I PMLR
%P 2259--2306
%U https://proceedings.mlr.press/v336/eldowa26a.html
%V 336
%X In many online learning and bandit problems, the actions we consider possess inherent similarities–for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.

APA

Eldowa, K., Rahier, T., Cablant, A., Mertikopoulos, P. & Gaillard, P.. (2026). Leveraging Similarities in Multi-Armed Bandits. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:2259-2306 Available from https://proceedings.mlr.press/v336/eldowa26a.html.

Related Material

Download PDF