A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase; Shinji Ito; Idan Mehalel

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase, Shinji Ito, Idan Mehalel

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:1075-1087, 2026.

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-chase26a,
  title = 	 {A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice},
  author =       {Chase, Zachary and Ito, Shinji and Mehalel, Idan},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {1075--1087},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/chase26a/chase26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/chase26a.html},
  abstract = 	 {We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.}
}

Endnote

%0 Conference Paper
%T A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
%A Zachary Chase
%A Shinji Ito
%A Idan Mehalel
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-chase26a
%I PMLR
%P 1075--1087
%U https://proceedings.mlr.press/v336/chase26a.html
%V 336
%X We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of [Kale ’14]. The two bounds determine the minimax optimal expected regret to be $\Theta\left( \sqrt{T K \log \frac{N}{K} } \right)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

APA

Chase, Z., Ito, S. & Mehalel, I.. (2026). A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:1075-1087 Available from https://proceedings.mlr.press/v336/chase26a.html.

Related Material

Download PDF