The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

Sepehr Assadi; Chen Wang

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

Sepehr Assadi, Chen Wang

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:311-358, 2024.

Abstract

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of

$O(n/\Delta^2)$ requires

$\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here,

$n$ is the number of arms and

$\Delta$ is the reward gap between the best and the second-best arms. Our result matches the

$O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses

$O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-assadi24a,
  title = 	 {The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits},
  author =       {Assadi, Sepehr and Wang, Chen},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {311--358},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/assadi24a/assadi24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/assadi24a.html},
  abstract = 	 {We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].}
}

Endnote

%0 Conference Paper
%T The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits
%A Sepehr Assadi
%A Chen Wang
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-assadi24a
%I PMLR
%P 311--358
%U https://proceedings.mlr.press/v247/assadi24a.html
%V 247
%X We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].

APA


Assadi, S. & Wang, C.. (2024). The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:311-358 Available from https://proceedings.mlr.press/v247/assadi24a.html.

Related Material

Download PDF