Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45185-45211, 2025.

Abstract

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-muller25b, title = {Best of Both Worlds: Regret Minimization versus Minimax Play}, author = {M\"{u}ller, Adrian and Schneider, Jon and Skoulakis, Stratis and Viano, Luca and Cevher, Volkan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {45185--45211}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/muller25b/muller25b.pdf}, url = {https://proceedings.mlr.press/v267/muller25b.html}, abstract = {In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.} }
Endnote
%0 Conference Paper %T Best of Both Worlds: Regret Minimization versus Minimax Play %A Adrian Müller %A Jon Schneider %A Stratis Skoulakis %A Luca Viano %A Volkan Cevher %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-muller25b %I PMLR %P 45185--45211 %U https://proceedings.mlr.press/v267/muller25b.html %V 267 %X In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.
APA
Müller, A., Schneider, J., Skoulakis, S., Viano, L. & Cevher, V.. (2025). Best of Both Worlds: Regret Minimization versus Minimax Play. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:45185-45211 Available from https://proceedings.mlr.press/v267/muller25b.html.

Related Material