A Sharp Memory-Regret Trade-off for Multi-Pass Streaming Bandits

Arpit Agarwal; Sanjeev Khanna; Prathamesh Patil

A Sharp Memory-Regret Trade-off for Multi-Pass Streaming Bandits

Arpit Agarwal, Sanjeev Khanna, Prathamesh Patil

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1423-1462, 2022.

Abstract

The stochastic

$K$ -armed bandit problem has been studied extensively due to its applications in various domains ranging from online advertising to clinical trials. In practice however, the number of arms can be very large resulting in large memory requirements for simultaneously processing them. In this paper we consider a streaming setting where the arms are presented in a stream and the algorithm uses limited memory to process these arms. Here, the goal is not only to minimize regret, but also to do so in minimal memory. Previous algorithms for this problem operate in one of the two settings: they either use

$\Omega(\log \log T)$ passes over the stream \citep{rathod2021reducing, ChaudhuriKa20, Liau+18}, or just a single pass \citep{Maiti+21}. In this paper we study the trade-off between memory and regret when

$B$ passes over the stream are allowed, for any

$B \geq 1$ , and establish \emph{tight} regret upper and lower bounds for any

$B$ -pass algorithm. Our results uncover a surprising \emph{sharp transition phenomenon}:

$O(1)$ memory is sufficient to achieve

$\widetilde\Theta\paren{T^{\half + \frac{1}{2^{B+2}-2}}}$ regret in

$B$ passes, and increasing the memory to any quantity that is

$o(K)$ has almost no impact on further reducing this regret, unless we use

$\Omega(K)$ memory. Our main technical contribution is our lower bound which requires the use of \emph{information-theoretic techniques} as well as ideas from \emph{round elimination} to show that the \emph{residual problem} remains challenging over subsequent passes.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-agarwal22a,
  title = 	 {A Sharp Memory-Regret Trade-off for Multi-Pass Streaming Bandits},
  author =       {Agarwal, Arpit and Khanna, Sanjeev and Patil, Prathamesh},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {1423--1462},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/agarwal22a/agarwal22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/agarwal22a.html},
  abstract = 	 {The stochastic $K$-armed bandit problem has been studied extensively due to its applications in various domains ranging from online advertising to clinical trials. In practice however, the number of arms can be very large resulting in large memory requirements for simultaneously processing them. In this paper we consider a streaming setting where the arms are presented in a stream and the algorithm uses limited memory to process these arms. Here, the goal is not only to minimize regret, but also to do so in minimal memory. Previous algorithms for this problem operate in one of the two settings: they either use $\Omega(\log \log T)$ passes over the stream \citep{rathod2021reducing, ChaudhuriKa20, Liau+18}, or just a single pass \citep{Maiti+21}.  In this paper we study the trade-off between memory and regret when $B$ passes over the stream are allowed, for any $B \geq 1$, and establish \emph{tight} regret upper and lower bounds for any $B$-pass algorithm. Our results uncover a surprising \emph{sharp transition phenomenon}: $O(1)$ memory is sufficient to achieve $\widetilde\Theta\paren{T^{\half + \frac{1}{2^{B+2}-2}}}$ regret in $B$ passes, and increasing the memory to any quantity that is $o(K)$ has almost no impact on further reducing this regret, unless we use $\Omega(K)$ memory. Our main technical contribution is our lower bound which requires the use of \emph{information-theoretic techniques} as well as ideas from \emph{round elimination} to show that the \emph{residual problem} remains challenging over subsequent passes.}
}

Endnote

%0 Conference Paper
%T A Sharp Memory-Regret Trade-off for Multi-Pass Streaming Bandits
%A Arpit Agarwal
%A Sanjeev Khanna
%A Prathamesh Patil
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-agarwal22a
%I PMLR
%P 1423--1462
%U https://proceedings.mlr.press/v178/agarwal22a.html
%V 178
%X The stochastic $K$-armed bandit problem has been studied extensively due to its applications in various domains ranging from online advertising to clinical trials. In practice however, the number of arms can be very large resulting in large memory requirements for simultaneously processing them. In this paper we consider a streaming setting where the arms are presented in a stream and the algorithm uses limited memory to process these arms. Here, the goal is not only to minimize regret, but also to do so in minimal memory. Previous algorithms for this problem operate in one of the two settings: they either use $\Omega(\log \log T)$ passes over the stream \citep{rathod2021reducing, ChaudhuriKa20, Liau+18}, or just a single pass \citep{Maiti+21}.  In this paper we study the trade-off between memory and regret when $B$ passes over the stream are allowed, for any $B \geq 1$, and establish \emph{tight} regret upper and lower bounds for any $B$-pass algorithm. Our results uncover a surprising \emph{sharp transition phenomenon}: $O(1)$ memory is sufficient to achieve $\widetilde\Theta\paren{T^{\half + \frac{1}{2^{B+2}-2}}}$ regret in $B$ passes, and increasing the memory to any quantity that is $o(K)$ has almost no impact on further reducing this regret, unless we use $\Omega(K)$ memory. Our main technical contribution is our lower bound which requires the use of \emph{information-theoretic techniques} as well as ideas from \emph{round elimination} to show that the \emph{residual problem} remains challenging over subsequent passes.

APA


Agarwal, A., Khanna, S. & Patil, P.. (2022). A Sharp Memory-Regret Trade-off for Multi-Pass Streaming Bandits. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1423-1462 Available from https://proceedings.mlr.press/v178/agarwal22a.html.

Related Material

Download PDF