Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Fares Fourati; Vaneet Aggarwal; Christopher Quinn; Mohamed-Slim Alouini

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Fares Fourati, Vaneet Aggarwal, Christopher Quinn, Mohamed-Slim Alouini

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7455-7471, 2023.

Abstract

We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a

$\frac{1}{2}$ -regret upper bound of

$\tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon

$T$ and number of arms

$n$ . We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-fourati23a,
  title = 	 {Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback},
  author =       {Fourati, Fares and Aggarwal, Vaneet and Quinn, Christopher and Alouini, Mohamed-Slim},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {7455--7471},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/fourati23a/fourati23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/fourati23a.html},
  abstract = 	 {We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $\frac{1}{2}$-regret upper bound of $\tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon $T$ and number of arms $n$. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.}
}

Endnote

%0 Conference Paper
%T Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback
%A Fares Fourati
%A Vaneet Aggarwal
%A Christopher Quinn
%A Mohamed-Slim Alouini
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-fourati23a
%I PMLR
%P 7455--7471
%U https://proceedings.mlr.press/v206/fourati23a.html
%V 206
%X We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $\frac{1}{2}$-regret upper bound of $\tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon $T$ and number of arms $n$. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.

APA


Fourati, F., Aggarwal, V., Quinn, C. & Alouini, M.. (2023). Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7455-7471 Available from https://proceedings.mlr.press/v206/fourati23a.html.

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Abstract

Cite this Paper

Related Material