The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

Sepehr Assadi, Chen Wang
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:311-358, 2024.

Abstract

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-assadi24a, title = {The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits}, author = {Assadi, Sepehr and Wang, Chen}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {311--358}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/assadi24a/assadi24a.pdf}, url = {https://proceedings.mlr.press/v247/assadi24a.html}, abstract = {We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].} }
Endnote
%0 Conference Paper %T The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits %A Sepehr Assadi %A Chen Wang %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-assadi24a %I PMLR %P 311--358 %U https://proceedings.mlr.press/v247/assadi24a.html %V 247 %X We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].
APA
Assadi, S. & Wang, C.. (2024). The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:311-358 Available from https://proceedings.mlr.press/v247/assadi24a.html.

Related Material