An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Guanyu Nie, Mridul Agarwal, Abhishek Kumar Umrawal, Vaneet Aggarwal, Christopher John Quinn
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1541-1551, 2022.

Abstract

We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-nie22a, title = {An explore-then-commit algorithm for submodular maximization under full-bandit feedback}, author = {Nie, Guanyu and Agarwal, Mridul and Umrawal, Abhishek Kumar and Aggarwal, Vaneet and John Quinn, Christopher}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {1541--1551}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/nie22a/nie22a.pdf}, url = {https://proceedings.mlr.press/v180/nie22a.html}, abstract = {We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.} }
Endnote
%0 Conference Paper %T An explore-then-commit algorithm for submodular maximization under full-bandit feedback %A Guanyu Nie %A Mridul Agarwal %A Abhishek Kumar Umrawal %A Vaneet Aggarwal %A Christopher John Quinn %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-nie22a %I PMLR %P 1541--1551 %U https://proceedings.mlr.press/v180/nie22a.html %V 180 %X We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.
APA
Nie, G., Agarwal, M., Umrawal, A.K., Aggarwal, V. & John Quinn, C.. (2022). An explore-then-commit algorithm for submodular maximization under full-bandit feedback. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1541-1551 Available from https://proceedings.mlr.press/v180/nie22a.html.

Related Material