Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Peter Auer; Pratik Gajane; Ronald Ortner

Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Peter Auer, Pratik Gajane, Ronald Ortner

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:138-158, 2019.

Abstract

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-auer19a,
  title = 	 {Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes},
  author =       {Auer, Peter and Gajane, Pratik and Ortner, Ronald},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {138--158},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/auer19a/auer19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/auer19a.html},
  abstract = 	 {We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.}
}

Endnote

%0 Conference Paper
%T Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
%A Peter Auer
%A Pratik Gajane
%A Ronald Ortner
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-auer19a
%I PMLR
%P 138--158
%U https://proceedings.mlr.press/v99/auer19a.html
%V 99
%X We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

APA


Auer, P., Gajane, P. & Ortner, R.. (2019). Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:138-158 Available from https://proceedings.mlr.press/v99/auer19a.html.

Related Material

Download PDF