Last Switch Dependent Bandits with Monotone Payoff Functions

Ayoub Foussoul; Vineet Goyal; Orestis Papadigenopoulos; Assaf Zeevi

Last Switch Dependent Bandits with Monotone Payoff Functions

Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10265-10284, 2023.

Abstract

In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-foussoul23a,
  title = 	 {Last Switch Dependent Bandits with Monotone Payoff Functions},
  author =       {Foussoul, Ayoub and Goyal, Vineet and Papadigenopoulos, Orestis and Zeevi, Assaf},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {10265--10284},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/foussoul23a/foussoul23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/foussoul23a.html},
  abstract = 	 {In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.}
}

Endnote

%0 Conference Paper
%T Last Switch Dependent Bandits with Monotone Payoff Functions
%A Ayoub Foussoul
%A Vineet Goyal
%A Orestis Papadigenopoulos
%A Assaf Zeevi
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-foussoul23a
%I PMLR
%P 10265--10284
%U https://proceedings.mlr.press/v202/foussoul23a.html
%V 202
%X In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.

APA

Foussoul, A., Goyal, V., Papadigenopoulos, O. & Zeevi, A.. (2023). Last Switch Dependent Bandits with Monotone Payoff Functions. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10265-10284 Available from https://proceedings.mlr.press/v202/foussoul23a.html.

Last Switch Dependent Bandits with Monotone Payoff Functions

Abstract

Cite this Paper

Related Material