Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits

Taira Tsuchiya, Shinji Ito, Junya Honda
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:8117-8144, 2023.

Abstract

We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms, including Thompson sampling under most settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-tsuchiya23a, title = {Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits}, author = {Tsuchiya, Taira and Ito, Shinji and Honda, Junya}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {8117--8144}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/tsuchiya23a/tsuchiya23a.pdf}, url = {https://proceedings.mlr.press/v206/tsuchiya23a.html}, abstract = {We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms, including Thompson sampling under most settings.} }
Endnote
%0 Conference Paper %T Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits %A Taira Tsuchiya %A Shinji Ito %A Junya Honda %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-tsuchiya23a %I PMLR %P 8117--8144 %U https://proceedings.mlr.press/v206/tsuchiya23a.html %V 206 %X We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms, including Thompson sampling under most settings.
APA
Tsuchiya, T., Ito, S. & Honda, J.. (2023). Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:8117-8144 Available from https://proceedings.mlr.press/v206/tsuchiya23a.html.

Related Material