A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase, Shinji Ito, Idan Mehalel
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:1075-1087, 2026.

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-chase26a, title = {A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice}, author = {Chase, Zachary and Ito, Shinji and Mehalel, Idan}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {1075--1087}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/chase26a/chase26a.pdf}, url = {https://proceedings.mlr.press/v336/chase26a.html}, abstract = {We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.} }
Endnote
%0 Conference Paper %T A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice %A Zachary Chase %A Shinji Ito %A Idan Mehalel %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-chase26a %I PMLR %P 1075--1087 %U https://proceedings.mlr.press/v336/chase26a.html %V 336 %X We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.
APA
Chase, Z., Ito, S. & Mehalel, I.. (2026). A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:1075-1087 Available from https://proceedings.mlr.press/v336/chase26a.html.

Related Material