Stochastic Multi-armed Bandits in Constant Space

David Liau; Zhao Song; Eric Price; Ger Yang

Stochastic Multi-armed Bandits in Constant Space

David Liau, Zhao Song, Eric Price, Ger Yang

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:386-394, 2018.

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.

Cite this Paper

BibTeX

@InProceedings{pmlr-v84-liau18a,
  title = 	 {Stochastic Multi-armed Bandits in Constant Space},
  author = 	 {Liau, David and Song, Zhao and Price, Eric and Yang, Ger},
  booktitle = 	 {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
  pages = 	 {386--394},
  year = 	 {2018},
  editor = 	 {Storkey, Amos and Perez-Cruz, Fernando},
  volume = 	 {84},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v84/liau18a/liau18a.pdf},
  url = 	 {https://proceedings.mlr.press/v84/liau18a.html},
  abstract = 	 {We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.  We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms.  If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.}
}

Endnote

%0 Conference Paper
%T Stochastic Multi-armed Bandits in Constant Space
%A David Liau
%A Zhao Song
%A Eric Price
%A Ger Yang
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz	
%F pmlr-v84-liau18a
%I PMLR
%P 386--394
%U https://proceedings.mlr.press/v84/liau18a.html
%V 84
%X We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms.  We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms.  If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.

APA

Liau, D., Song, Z., Price, E. & Yang, G.. (2018). Stochastic Multi-armed Bandits in Constant Space. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:386-394 Available from https://proceedings.mlr.press/v84/liau18a.html.

Stochastic Multi-armed Bandits in Constant Space

Abstract

Cite this Paper

Related Material