Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes

Peter Auer, Pratik Gajane, Ronald Ortner
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:138-158, 2019.

Abstract

We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-auer19a, title = {Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes}, author = {Auer, Peter and Gajane, Pratik and Ortner, Ronald}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {138--158}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/auer19a/auer19a.pdf}, url = {https://proceedings.mlr.press/v99/auer19a.html}, abstract = {We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.} }
Endnote
%0 Conference Paper %T Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes %A Peter Auer %A Pratik Gajane %A Ronald Ortner %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-auer19a %I PMLR %P 138--158 %U https://proceedings.mlr.press/v99/auer19a.html %V 99 %X We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.
APA
Auer, P., Gajane, P. & Ortner, R.. (2019). Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:138-158 Available from https://proceedings.mlr.press/v99/auer19a.html.

Related Material