Tracking Most Significant Arm Switches in Bandits

Joe Suk; Samory Kpotufe

Tracking Most Significant Arm Switches in Bandits

Joe Suk, Samory Kpotufe

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:2160-2182, 2022.

Abstract

In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret

$\sqrt{LT}$ , for

$T$ rounds, and an unknown number

$L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than

$O(\sqrt{ST})$ , where

$S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal

$O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation}

$V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-suk22a,
  title = 	 {Tracking Most Significant Arm Switches in Bandits},
  author =       {Suk, Joe and Kpotufe, Samory},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {2160--2182},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/suk22a/suk22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/suk22a.html},
  abstract = 	 {In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates  always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.}
}

Endnote

%0 Conference Paper
%T Tracking Most Significant Arm Switches in Bandits
%A Joe Suk
%A Samory Kpotufe
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-suk22a
%I PMLR
%P 2160--2182
%U https://proceedings.mlr.press/v178/suk22a.html
%V 178
%X In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates  always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.

APA


Suk, J. & Kpotufe, S.. (2022). Tracking Most Significant Arm Switches in Bandits. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:2160-2182 Available from https://proceedings.mlr.press/v178/suk22a.html.

Related Material

Download PDF