An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Guanyu Nie; Mridul Agarwal; Abhishek Kumar Umrawal; Vaneet Aggarwal; Christopher John Quinn

An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Guanyu Nie, Mridul Agarwal, Abhishek Kumar Umrawal, Vaneet Aggarwal, Christopher John Quinn

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1541-1551, 2022.

Abstract

We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step

$t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a

$(1-1/e)$ -regret upper bound of

$\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon

$T$ , number of base elements

$n$ , and cardinality constraint

$k$ . We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-nie22a,
  title = 	 {An explore-then-commit algorithm for submodular maximization under full-bandit feedback},
  author =       {Nie, Guanyu and Agarwal, Mridul and Umrawal, Abhishek Kumar and Aggarwal, Vaneet and John Quinn, Christopher},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1541--1551},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/nie22a/nie22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/nie22a.html},
  abstract = 	 {We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.}
}

Endnote

%0 Conference Paper
%T An explore-then-commit algorithm for submodular maximization under full-bandit feedback
%A Guanyu Nie
%A Mridul Agarwal
%A Abhishek Kumar Umrawal
%A Vaneet Aggarwal
%A Christopher John Quinn
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-nie22a
%I PMLR
%P 1541--1551
%U https://proceedings.mlr.press/v180/nie22a.html
%V 180
%X We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.

APA


Nie, G., Agarwal, M., Umrawal, A.K., Aggarwal, V. & John Quinn, C.. (2022). An explore-then-commit algorithm for submodular maximization under full-bandit feedback. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1541-1551 Available from https://proceedings.mlr.press/v180/nie22a.html.

An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Abstract

Cite this Paper

Related Material