Stochastic Rising Bandits

Alberto Maria Metelli; Francesco Trovò; Matteo Pirola; Marcello Restelli

Stochastic Rising Bandits

Alberto Maria Metelli, Francesco Trovò, Matteo Pirola, Marcello Restelli

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15421-15457, 2022.

Abstract

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-metelli22a,
  title = 	 {Stochastic Rising Bandits},
  author =       {Metelli, Alberto Maria and Trov{\`o}, Francesco and Pirola, Matteo and Restelli, Marcello},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {15421--15457},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/metelli22a/metelli22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/metelli22a.html},
  abstract = 	 {This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.}
}

Endnote

%0 Conference Paper
%T Stochastic Rising Bandits
%A Alberto Maria Metelli
%A Francesco Trovò
%A Matteo Pirola
%A Marcello Restelli
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-metelli22a
%I PMLR
%P 15421--15457
%U https://proceedings.mlr.press/v162/metelli22a.html
%V 162
%X This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

APA

Metelli, A.M., Trovò, F., Pirola, M. & Restelli, M.. (2022). Stochastic Rising Bandits. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:15421-15457 Available from https://proceedings.mlr.press/v162/metelli22a.html.

Related Material

Download PDF