Sum-max Submodular Bandits

Stephen U Pasteris; Alberto Rumi; Fabio Vitale; Nicolò Cesa-Bianchi

Sum-max Submodular Bandits

Stephen U Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2323-2331, 2024.

Abstract

Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v238-u-pasteris24a,
  title = 	 { Sum-max Submodular Bandits },
  author =       {U Pasteris, Stephen and Rumi, Alberto and Vitale, Fabio and Cesa-Bianchi, Nicol\`{o}},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2323--2331},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/u-pasteris24a/u-pasteris24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/u-pasteris24a.html},
  abstract = 	 { Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem. }
}

Endnote

%0 Conference Paper
%T  Sum-max Submodular Bandits 
%A Stephen U Pasteris
%A Alberto Rumi
%A Fabio Vitale
%A Nicolò Cesa-Bianchi
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-u-pasteris24a
%I PMLR
%P 2323--2331
%U https://proceedings.mlr.press/v238/u-pasteris24a.html
%V 238
%X  Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

APA


U Pasteris, S., Rumi, A., Vitale, F. & Cesa-Bianchi, N.. (2024).  Sum-max Submodular Bandits . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2323-2331 Available from https://proceedings.mlr.press/v238/u-pasteris24a.html.

Sum-max Submodular Bandits

Abstract

Cite this Paper

Related Material