Tracking Most Significant Arm Switches in Bandits

Joe Suk, Samory Kpotufe
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:2160-2182, 2022.

Abstract

In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret LT, for T rounds, and an unknown number L of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than O(ST), where SL only counts best arm switches, while at the same time, always faster than the optimal O(V13T23) when expressed in terms of \emph{total variation} V (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-suk22a, title = {Tracking Most Significant Arm Switches in Bandits}, author = {Suk, Joe and Kpotufe, Samory}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {2160--2182}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/suk22a/suk22a.pdf}, url = {https://proceedings.mlr.press/v178/suk22a.html}, abstract = {In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.} }
Endnote
%0 Conference Paper %T Tracking Most Significant Arm Switches in Bandits %A Joe Suk %A Samory Kpotufe %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-suk22a %I PMLR %P 2160--2182 %U https://proceedings.mlr.press/v178/suk22a.html %V 178 %X In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.
APA
Suk, J. & Kpotufe, S.. (2022). Tracking Most Significant Arm Switches in Bandits. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:2160-2182 Available from https://proceedings.mlr.press/v178/suk22a.html.

Related Material