Efficient Learning in Large-Scale Combinatorial Semi-Bandits

Zheng Wen; Branislav Kveton; Azin Ashkan

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

Zheng Wen, Branislav Kveton, Azin Ashkan

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1113-1122, 2015.

Abstract

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-wen15,
  title = 	 {Efficient Learning in Large-Scale Combinatorial Semi-Bandits},
  author = 	 {Wen, Zheng and Kveton, Branislav and Ashkan, Azin},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1113--1122},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/wen15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/wen15.html},
  abstract = 	 {A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.}
}

Endnote

%0 Conference Paper
%T Efficient Learning in Large-Scale Combinatorial Semi-Bandits
%A Zheng Wen
%A Branislav Kveton
%A Azin Ashkan
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-wen15
%I PMLR
%P 1113--1122
%U https://proceedings.mlr.press/v37/wen15.html
%V 37
%X A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.

RIS


TY  - CPAPER
TI  - Efficient Learning in Large-Scale Combinatorial Semi-Bandits
AU  - Zheng Wen
AU  - Branislav Kveton
AU  - Azin Ashkan
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-wen15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1113
EP  - 1122
L1  - http://proceedings.mlr.press/v37/wen15.pdf
UR  - https://proceedings.mlr.press/v37/wen15.html
AB  - A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.
ER  -

APA


Wen, Z., Kveton, B. & Ashkan, A.. (2015). Efficient Learning in Large-Scale Combinatorial Semi-Bandits. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1113-1122 Available from https://proceedings.mlr.press/v37/wen15.html.

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

Abstract

Cite this Paper

Related Material