Optimal Streaming Algorithms for Multi-Armed Bandits

Tianyuan Jin, Keke Huang, Jing Tang, Xiaokui Xiao
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5045-5054, 2021.

Abstract

This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of n arms with reward distributions supported on [0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \epslion-topk-arms identification problem, which asks for k arms whose reward means are lower than that of the k-th best arm by at most \epsilon with probability at least 1-\delta. For general \epsilon \in (0,1), the existing solution for this problem assumes k = 1 and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log \frac{1}{\delta}) using O(\log^*(n)) memory and a single pass of the stream. We propose an algorithm that works for any k and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log\frac{k}{\delta}) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1-\delta probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(\log \Delta_2^{-1}) passes, where \Delta_2 is the gap between the mean of the best arm and that of the second best arm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-jin21a, title = {Optimal Streaming Algorithms for Multi-Armed Bandits}, author = {Jin, Tianyuan and Huang, Keke and Tang, Jing and Xiao, Xiaokui}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {5045--5054}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/jin21a/jin21a.pdf}, url = {https://proceedings.mlr.press/v139/jin21a.html}, abstract = {This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of n arms with reward distributions supported on [0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \epslion-topk-arms identification problem, which asks for k arms whose reward means are lower than that of the k-th best arm by at most \epsilon with probability at least 1-\delta. For general \epsilon \in (0,1), the existing solution for this problem assumes k = 1 and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log \frac{1}{\delta}) using O(\log^*(n)) memory and a single pass of the stream. We propose an algorithm that works for any k and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log\frac{k}{\delta}) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1-\delta probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(\log \Delta_2^{-1}) passes, where \Delta_2 is the gap between the mean of the best arm and that of the second best arm.} }
Endnote
%0 Conference Paper %T Optimal Streaming Algorithms for Multi-Armed Bandits %A Tianyuan Jin %A Keke Huang %A Jing Tang %A Xiaokui Xiao %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-jin21a %I PMLR %P 5045--5054 %U https://proceedings.mlr.press/v139/jin21a.html %V 139 %X This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of n arms with reward distributions supported on [0,1] with unknown means. The arms in the stream are arriving one by one, and the algorithm cannot access an arm unless it is stored in a limited size memory. We first study the streaming \epslion-topk-arms identification problem, which asks for k arms whose reward means are lower than that of the k-th best arm by at most \epsilon with probability at least 1-\delta. For general \epsilon \in (0,1), the existing solution for this problem assumes k = 1 and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log \frac{1}{\delta}) using O(\log^*(n)) memory and a single pass of the stream. We propose an algorithm that works for any k and achieves the optimal sample complexity O(\frac{n}{\epsilon^2} \log\frac{k}{\delta}) using a single-arm memory and a single pass of the stream. Second, we study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least 1-\delta probability, using a single-arm memory and as few passes of the input stream as possible. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(\log \Delta_2^{-1}) passes, where \Delta_2 is the gap between the mean of the best arm and that of the second best arm.
APA
Jin, T., Huang, K., Tang, J. & Xiao, X.. (2021). Optimal Streaming Algorithms for Multi-Armed Bandits. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5045-5054 Available from https://proceedings.mlr.press/v139/jin21a.html.

Related Material