A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits

Pierre Laforgue; Giulia Clerici; Nicolò Cesa-Bianchi; Ran Gilad-Bachrach

A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits

Pierre Laforgue, Giulia Clerici, Nicolò Cesa-Bianchi, Ran Gilad-Bachrach

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:971-990, 2022.

Abstract

Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-laforgue22a,
  title = 	 { A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits },
  author =       {Laforgue, Pierre and Clerici, Giulia and Cesa-Bianchi, Nicol\`o and Gilad-Bachrach, Ran},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {971--990},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/laforgue22a/laforgue22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/laforgue22a.html},
  abstract = 	 { Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver. }
}

Endnote

%0 Conference Paper
%T  A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits 
%A Pierre Laforgue
%A Giulia Clerici
%A Nicolò Cesa-Bianchi
%A Ran Gilad-Bachrach
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-laforgue22a
%I PMLR
%P 971--990
%U https://proceedings.mlr.press/v151/laforgue22a.html
%V 151
%X  Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver.

APA


Laforgue, P., Clerici, G., Cesa-Bianchi, N. & Gilad-Bachrach, R.. (2022).  A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:971-990 Available from https://proceedings.mlr.press/v151/laforgue22a.html.

A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits

Abstract

Cite this Paper

Related Material