Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:535-543, 2015.

Abstract

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-kveton15, title = {{Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits}}, author = {Branislav Kveton and Zheng Wen and Azin Ashkan and Csaba Szepesvari}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {535--543}, year = {2015}, editor = {Guy Lebanon and S. V. N. Vishwanathan}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/kveton15.pdf}, url = { http://proceedings.mlr.press/v38/kveton15.html }, abstract = {A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.} }
Endnote
%0 Conference Paper %T Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits %A Branislav Kveton %A Zheng Wen %A Azin Ashkan %A Csaba Szepesvari %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-kveton15 %I PMLR %P 535--543 %U http://proceedings.mlr.press/v38/kveton15.html %V 38 %X A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.
RIS
TY - CPAPER TI - Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits AU - Branislav Kveton AU - Zheng Wen AU - Azin Ashkan AU - Csaba Szepesvari BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-kveton15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 535 EP - 543 L1 - http://proceedings.mlr.press/v38/kveton15.pdf UR - http://proceedings.mlr.press/v38/kveton15.html AB - A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor. ER -
APA
Kveton, B., Wen, Z., Ashkan, A. & Szepesvari, C.. (2015). Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:535-543 Available from http://proceedings.mlr.press/v38/kveton15.html .

Related Material