Graph-Triggered Rising Bandits

Gianmarco Genalti, Marco Mussi, Nicola Gatti, Marcello Restelli, Matteo Castiglioni, Alberto Maria Metelli
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:15351-15380, 2024.

Abstract

In this paper, we propose a novel generalization of rested and restless bandits where the evolution of the arms’ expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerate) graphs. Still, the model can represent way more general and interesting scenarios. We first tackle the problem of computing the optimal policy when no specific structure is assumed on the graph, showing that it is NP-hard. Then, we focus on a specific structure forcing the graph to be composed of a set of fully connected subgraphs (i.e., cliques), and we prove that the optimal policy can be easily computed in closed form. Then, we move to the learning problem presenting regret minimization algorithms for deterministic and stochastic cases. Our regret bounds highlight the complexity of the learning problem by incorporating instance-dependent terms that encode specific properties of the underlying graph structure. Moreover, we illustrate how the knowledge of the underlying graph is not necessary for achieving the no-regret property.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-genalti24a, title = {Graph-Triggered Rising Bandits}, author = {Genalti, Gianmarco and Mussi, Marco and Gatti, Nicola and Restelli, Marcello and Castiglioni, Matteo and Metelli, Alberto Maria}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {15351--15380}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/genalti24a/genalti24a.pdf}, url = {https://proceedings.mlr.press/v235/genalti24a.html}, abstract = {In this paper, we propose a novel generalization of rested and restless bandits where the evolution of the arms’ expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerate) graphs. Still, the model can represent way more general and interesting scenarios. We first tackle the problem of computing the optimal policy when no specific structure is assumed on the graph, showing that it is NP-hard. Then, we focus on a specific structure forcing the graph to be composed of a set of fully connected subgraphs (i.e., cliques), and we prove that the optimal policy can be easily computed in closed form. Then, we move to the learning problem presenting regret minimization algorithms for deterministic and stochastic cases. Our regret bounds highlight the complexity of the learning problem by incorporating instance-dependent terms that encode specific properties of the underlying graph structure. Moreover, we illustrate how the knowledge of the underlying graph is not necessary for achieving the no-regret property.} }
Endnote
%0 Conference Paper %T Graph-Triggered Rising Bandits %A Gianmarco Genalti %A Marco Mussi %A Nicola Gatti %A Marcello Restelli %A Matteo Castiglioni %A Alberto Maria Metelli %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-genalti24a %I PMLR %P 15351--15380 %U https://proceedings.mlr.press/v235/genalti24a.html %V 235 %X In this paper, we propose a novel generalization of rested and restless bandits where the evolution of the arms’ expected rewards is governed by a graph defined over the arms. An edge connecting a pair of arms $(i,j)$ represents the fact that a pull of arm $i$ triggers the evolution of arm $j$, and vice versa. Interestingly, rested and restless bandits are both special cases of our model for some suitable (degenerate) graphs. Still, the model can represent way more general and interesting scenarios. We first tackle the problem of computing the optimal policy when no specific structure is assumed on the graph, showing that it is NP-hard. Then, we focus on a specific structure forcing the graph to be composed of a set of fully connected subgraphs (i.e., cliques), and we prove that the optimal policy can be easily computed in closed form. Then, we move to the learning problem presenting regret minimization algorithms for deterministic and stochastic cases. Our regret bounds highlight the complexity of the learning problem by incorporating instance-dependent terms that encode specific properties of the underlying graph structure. Moreover, we illustrate how the knowledge of the underlying graph is not necessary for achieving the no-regret property.
APA
Genalti, G., Mussi, M., Gatti, N., Restelli, M., Castiglioni, M. & Metelli, A.M.. (2024). Graph-Triggered Rising Bandits. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:15351-15380 Available from https://proceedings.mlr.press/v235/genalti24a.html.

Related Material