Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:306-339, 2021.

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-agarwal21c, title = {Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback}, author = {Agarwal, Mridul and Aggarwal, Vaneet and Quinn, Christopher J. and Umrawal, Abhishek K.}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {306--339}, year = {2021}, editor = {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/agarwal21c/agarwal21c.pdf}, url = {https://proceedings.mlr.press/v132/agarwal21c.html}, abstract = {Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.} }
Endnote
%0 Conference Paper %T Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback %A Mridul Agarwal %A Vaneet Aggarwal %A Christopher J. Quinn %A Abhishek K. Umrawal %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-agarwal21c %I PMLR %P 306--339 %U https://proceedings.mlr.press/v132/agarwal21c.html %V 132 %X Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.
APA
Agarwal, M., Aggarwal, V., Quinn, C.J. & Umrawal, A.K.. (2021). Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:306-339 Available from https://proceedings.mlr.press/v132/agarwal21c.html.

Related Material