A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Chris Dann, Chen-Yu Wei, Julian Zimmert
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:5503-5570, 2023.

Abstract

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialized potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can obtain $O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret in the adversarial regime. In this work, we resolve this question positively and present a generally applicable reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) algorithms. We showcase the capability of this reduction by transforming existing algorithms that only achieve worst-case guarantees into new best-of-both-worlds algorithms in the setting of contextual bandits, graph bandits and tabular Markov decision processes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-dann23a, title = {A Blackbox Approach to Best of Both Worlds in Bandits and Beyond}, author = {Dann, Chris and Wei, Chen-Yu and Zimmert, Julian}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {5503--5570}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/dann23a/dann23a.pdf}, url = {https://proceedings.mlr.press/v195/dann23a.html}, abstract = {Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialized potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can obtain $O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret in the adversarial regime. In this work, we resolve this question positively and present a generally applicable reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) algorithms. We showcase the capability of this reduction by transforming existing algorithms that only achieve worst-case guarantees into new best-of-both-worlds algorithms in the setting of contextual bandits, graph bandits and tabular Markov decision processes.} }
Endnote
%0 Conference Paper %T A Blackbox Approach to Best of Both Worlds in Bandits and Beyond %A Chris Dann %A Chen-Yu Wei %A Julian Zimmert %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-dann23a %I PMLR %P 5503--5570 %U https://proceedings.mlr.press/v195/dann23a.html %V 195 %X Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialized potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can obtain $O(\log(T))$ regret in the stochastic regime and $\tilde{O}(\sqrt{T})$ regret in the adversarial regime. In this work, we resolve this question positively and present a generally applicable reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) algorithms. We showcase the capability of this reduction by transforming existing algorithms that only achieve worst-case guarantees into new best-of-both-worlds algorithms in the setting of contextual bandits, graph bandits and tabular Markov decision processes.
APA
Dann, C., Wei, C. & Zimmert, J.. (2023). A Blackbox Approach to Best of Both Worlds in Bandits and Beyond. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:5503-5570 Available from https://proceedings.mlr.press/v195/dann23a.html.

Related Material