Stochastic Multi-armed Bandits in Constant Space

David Liau, Zhao Song, Eric Price, Ger Yang
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:386-394, 2018.

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-liau18a, title = {Stochastic Multi-armed Bandits in Constant Space}, author = {Liau, David and Song, Zhao and Price, Eric and Yang, Ger}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {386--394}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/liau18a/liau18a.pdf}, url = {https://proceedings.mlr.press/v84/liau18a.html}, abstract = {We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.} }
Endnote
%0 Conference Paper %T Stochastic Multi-armed Bandits in Constant Space %A David Liau %A Zhao Song %A Eric Price %A Ger Yang %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-liau18a %I PMLR %P 386--394 %U https://proceedings.mlr.press/v84/liau18a.html %V 84 %X We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.
APA
Liau, D., Song, Z., Price, E. & Yang, G.. (2018). Stochastic Multi-armed Bandits in Constant Space. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:386-394 Available from https://proceedings.mlr.press/v84/liau18a.html.

Related Material