Sum-max Submodular Bandits

Stephen U. Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2323-2331, 2024.

Abstract

Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-pasteris24a, title = {Sum-max Submodular Bandits}, author = {Pasteris, Stephen U. and Rumi, Alberto and Vitale, Fabio and Cesa-Bianchi, Nicol\`{o}}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2323--2331}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/pasteris24a/pasteris24a.pdf}, url = {https://proceedings.mlr.press/v238/pasteris24a.html}, abstract = {Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.} }
Endnote
%0 Conference Paper %T Sum-max Submodular Bandits %A Stephen U. Pasteris %A Alberto Rumi %A Fabio Vitale %A Nicolò Cesa-Bianchi %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-pasteris24a %I PMLR %P 2323--2331 %U https://proceedings.mlr.press/v238/pasteris24a.html %V 238 %X Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on $M$-medians and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{\mathcal{O}}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. We also extend our results to a bandit version of the facility location problem.
APA
Pasteris, S.U., Rumi, A., Vitale, F. & Cesa-Bianchi, N.. (2024). Sum-max Submodular Bandits. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2323-2331 Available from https://proceedings.mlr.press/v238/pasteris24a.html.

Related Material