Combinatorial Semi-Bandits with Knapsacks

Karthik Abinav Sankararaman, Aleksandrs Slivkins
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1760-1770, 2018.

Abstract

We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited “resources" consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, support it with several motivating examples, and design an algorithm for it. Our regret bounds are comparable with those for BwK and combinatorial semi-bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-sankararaman18a, title = {Combinatorial Semi-Bandits with Knapsacks}, author = {Sankararaman, Karthik Abinav and Slivkins, Aleksandrs}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1760--1770}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/sankararaman18a/sankararaman18a.pdf}, url = {https://proceedings.mlr.press/v84/sankararaman18a.html}, abstract = {We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited “resources" consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, support it with several motivating examples, and design an algorithm for it. Our regret bounds are comparable with those for BwK and combinatorial semi-bandits. } }
Endnote
%0 Conference Paper %T Combinatorial Semi-Bandits with Knapsacks %A Karthik Abinav Sankararaman %A Aleksandrs Slivkins %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-sankararaman18a %I PMLR %P 1760--1770 %U https://proceedings.mlr.press/v84/sankararaman18a.html %V 84 %X We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited “resources" consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, support it with several motivating examples, and design an algorithm for it. Our regret bounds are comparable with those for BwK and combinatorial semi-bandits.
APA
Sankararaman, K.A. & Slivkins, A.. (2018). Combinatorial Semi-Bandits with Knapsacks. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:1760-1770 Available from https://proceedings.mlr.press/v84/sankararaman18a.html.

Related Material