Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Mridul Agarwal; Vaneet Aggarwal; Christopher J. Quinn; Abhishek K. Umrawal

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:306-339, 2021.

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-agarwal21c,
  title = 	 {Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback},
  author =       {Agarwal, Mridul and Aggarwal, Vaneet and Quinn, Christopher J. and Umrawal, Abhishek K.},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {306--339},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/agarwal21c/agarwal21c.pdf},
  url = 	 {https://proceedings.mlr.press/v132/agarwal21c.html},
  abstract = 	 {Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.}
}

Endnote

%0 Conference Paper
%T Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback
%A Mridul Agarwal
%A Vaneet Aggarwal
%A Christopher J. Quinn
%A Abhishek K. Umrawal
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-agarwal21c
%I PMLR
%P 306--339
%U https://proceedings.mlr.press/v132/agarwal21c.html
%V 132
%X Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.

APA


Agarwal, M., Aggarwal, V., Quinn, C.J. & Umrawal, A.K.. (2021). Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:306-339 Available from https://proceedings.mlr.press/v132/agarwal21c.html.

Related Material

Download PDF