One Arrow, Two Kills: A Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

Pierre Gaillard, Aadirupa Saha, Soham Dan
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7755-7773, 2023.

Abstract

We address the problem of Internal Regret in adversarial Sleeping Bandits and the relationship between different notions of sleeping regrets in multi-armed bandits. We propose a new concept called Internal Regret for sleeping multi-armed bandits (MAB) and present an algorithm that achieves sublinear Internal Regret, even when losses and availabilities are both adversarial. We demonstrate that a low internal regret leads to both low external regret and low policy regret for i.i.d. losses. Our contribution is unifying existing notions of regret in sleeping bandits and exploring their implications for each other. In addition, we extend our results to Dueling Bandits (DB), a preference feedback version of multi-armed bandits, and design a low-regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. We validate the effectiveness of our algorithms through empirical evaluations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-gaillard23a, title = {One Arrow, Two Kills: A Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits}, author = {Gaillard, Pierre and Saha, Aadirupa and Dan, Soham}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7755--7773}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/gaillard23a/gaillard23a.pdf}, url = {https://proceedings.mlr.press/v206/gaillard23a.html}, abstract = {We address the problem of Internal Regret in adversarial Sleeping Bandits and the relationship between different notions of sleeping regrets in multi-armed bandits. We propose a new concept called Internal Regret for sleeping multi-armed bandits (MAB) and present an algorithm that achieves sublinear Internal Regret, even when losses and availabilities are both adversarial. We demonstrate that a low internal regret leads to both low external regret and low policy regret for i.i.d. losses. Our contribution is unifying existing notions of regret in sleeping bandits and exploring their implications for each other. In addition, we extend our results to Dueling Bandits (DB), a preference feedback version of multi-armed bandits, and design a low-regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. We validate the effectiveness of our algorithms through empirical evaluations.} }
Endnote
%0 Conference Paper %T One Arrow, Two Kills: A Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits %A Pierre Gaillard %A Aadirupa Saha %A Soham Dan %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-gaillard23a %I PMLR %P 7755--7773 %U https://proceedings.mlr.press/v206/gaillard23a.html %V 206 %X We address the problem of Internal Regret in adversarial Sleeping Bandits and the relationship between different notions of sleeping regrets in multi-armed bandits. We propose a new concept called Internal Regret for sleeping multi-armed bandits (MAB) and present an algorithm that achieves sublinear Internal Regret, even when losses and availabilities are both adversarial. We demonstrate that a low internal regret leads to both low external regret and low policy regret for i.i.d. losses. Our contribution is unifying existing notions of regret in sleeping bandits and exploring their implications for each other. In addition, we extend our results to Dueling Bandits (DB), a preference feedback version of multi-armed bandits, and design a low-regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. We validate the effectiveness of our algorithms through empirical evaluations.
APA
Gaillard, P., Saha, A. & Dan, S.. (2023). One Arrow, Two Kills: A Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7755-7773 Available from https://proceedings.mlr.press/v206/gaillard23a.html.

Related Material