Sum-max Submodular Bandits

Stephen U. Pasteris; Alberto Rumi; Fabio Vitale; Nicolò Cesa-Bianchi

Sum-max Submodular Bandits

Stephen U. Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2323-2331, 2024.

Abstract

Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-

$K$ -bandits, combinatorial bandits, and the bandit versions on

$M$ -medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove

$\big(1 - \frac{1}{e}\big)$ -regret bounds for bandit feedback in the nonstochastic setting of the order of

$\sqrt{MKT}$ (ignoring log factors), where

$T$ is the time horizon and

$M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the

$\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-pasteris24a,
  title = 	 {Sum-max Submodular Bandits},
  author =       {Pasteris, Stephen U. and Rumi, Alberto and Vitale, Fabio and Cesa-Bianchi, Nicol\`{o}},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2323--2331},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/pasteris24a/pasteris24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/pasteris24a.html},
  abstract = 	 {Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.}
}

Endnote

%0 Conference Paper
%T Sum-max Submodular Bandits
%A Stephen U. Pasteris
%A Alberto Rumi
%A Fabio Vitale
%A Nicolò Cesa-Bianchi
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-pasteris24a
%I PMLR
%P 2323--2331
%U https://proceedings.mlr.press/v238/pasteris24a.html
%V 238
%X Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

APA

Pasteris, S.U., Rumi, A., Vitale, F. & Cesa-Bianchi, N.. (2024). Sum-max Submodular Bandits. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2323-2331 Available from https://proceedings.mlr.press/v238/pasteris24a.html.

Sum-max Submodular Bandits

Abstract

Cite this Paper

Related Material