Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Chen Wang

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Chen Wang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:35525-35547, 2023.

Abstract

Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with

$o(K)$ memory have to incur

$\Omega(T^{2/3})$ regret, where

$K$ and

$T$ are the numbers of arms and trials. However, the previous best regret upper bound is still

$O(K^{1/3} T^{2/3}\log^{1/3}(T))$ , which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to

$\Omega(K^{1/3}T^{2/3})$ for algorithms with

$o(K)$ memory. We then show that the

$\log^{1/3}(T)$ factor is not necessary by designing algorithms with at most

$O(\log^*(K))$ -arm memory and achieve

$O(K^{1/3}T^{2/3})$ expected regret based on streaming

$\varepsilon$ -best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-wang23a,
  title = 	 {Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits},
  author =       {Wang, Chen},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {35525--35547},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/wang23a/wang23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/wang23a.html},
  abstract = 	 {Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with $o(K)$ memory have to incur $\Omega(T^{2/3})$ regret, where $K$ and $T$ are the numbers of arms and trials. However, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory. We then show that the $\log^{1/3}(T)$ factor is not necessary by designing algorithms with at most $O(\log^*(K))$-arm memory and achieve $O(K^{1/3}T^{2/3})$ expected regret based on streaming $\varepsilon$-best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.}
}

Endnote

%0 Conference Paper
%T Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
%A Chen Wang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-wang23a
%I PMLR
%P 35525--35547
%U https://proceedings.mlr.press/v202/wang23a.html
%V 202
%X Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with $o(K)$ memory have to incur $\Omega(T^{2/3})$ regret, where $K$ and $T$ are the numbers of arms and trials. However, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory. We then show that the $\log^{1/3}(T)$ factor is not necessary by designing algorithms with at most $O(\log^*(K))$-arm memory and achieve $O(K^{1/3}T^{2/3})$ expected regret based on streaming $\varepsilon$-best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.

APA


Wang, C.. (2023). Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:35525-35547 Available from https://proceedings.mlr.press/v202/wang23a.html.

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Abstract

Cite this Paper

Related Material