Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Branislav Kveton; Zheng Wen; Azin Ashkan; Csaba Szepesvari

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:535-543, 2015.

Abstract

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-kveton15,
  title = 	 {{Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits}},
  author = 	 {Kveton, Branislav and Wen, Zheng and Ashkan, Azin and Szepesvari, Csaba},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {535--543},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/kveton15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/kveton15.html},
  abstract = 	 {A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.}
}

Endnote

%0 Conference Paper
%T Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
%A Branislav Kveton
%A Zheng Wen
%A Azin Ashkan
%A Csaba Szepesvari
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-kveton15
%I PMLR
%P 535--543
%U https://proceedings.mlr.press/v38/kveton15.html
%V 38
%X A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

RIS


TY  - CPAPER
TI  - Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
AU  - Branislav Kveton
AU  - Zheng Wen
AU  - Azin Ashkan
AU  - Csaba Szepesvari
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-kveton15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 535
EP  - 543
L1  - http://proceedings.mlr.press/v38/kveton15.pdf
UR  - https://proceedings.mlr.press/v38/kveton15.html
AB  - A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.
ER  -

APA


Kveton, B., Wen, Z., Ashkan, A. & Szepesvari, C.. (2015). Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:535-543 Available from https://proceedings.mlr.press/v38/kveton15.html.

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Abstract

Cite this Paper

Related Material