Resourceful Contextual Bandits

Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins
; Proceedings of The 27th Conference on Learning Theory, PMLR 35:1109-1134, 2014.

Abstract

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.

Cite this Paper


BibTeX
@InProceedings{pmlr-v35-badanidiyuru14, title = {Resourceful Contextual Bandits}, author = {Ashwinkumar Badanidiyuru and John Langford and Aleksandrs Slivkins}, booktitle = {Proceedings of The 27th Conference on Learning Theory}, pages = {1109--1134}, year = {2014}, editor = {Maria Florina Balcan and Vitaly Feldman and Csaba Szepesvári}, volume = {35}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {13--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v35/badanidiyuru14.pdf}, url = {http://proceedings.mlr.press/v35/badanidiyuru14.html}, abstract = {We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.} }
Endnote
%0 Conference Paper %T Resourceful Contextual Bandits %A Ashwinkumar Badanidiyuru %A John Langford %A Aleksandrs Slivkins %B Proceedings of The 27th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2014 %E Maria Florina Balcan %E Vitaly Feldman %E Csaba Szepesvári %F pmlr-v35-badanidiyuru14 %I PMLR %J Proceedings of Machine Learning Research %P 1109--1134 %U http://proceedings.mlr.press %V 35 %W PMLR %X We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.
RIS
TY - CPAPER TI - Resourceful Contextual Bandits AU - Ashwinkumar Badanidiyuru AU - John Langford AU - Aleksandrs Slivkins BT - Proceedings of The 27th Conference on Learning Theory PY - 2014/05/29 DA - 2014/05/29 ED - Maria Florina Balcan ED - Vitaly Feldman ED - Csaba Szepesvári ID - pmlr-v35-badanidiyuru14 PB - PMLR SP - 1109 DP - PMLR EP - 1134 L1 - http://proceedings.mlr.press/v35/badanidiyuru14.pdf UR - http://proceedings.mlr.press/v35/badanidiyuru14.html AB - We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties. ER -
APA
Badanidiyuru, A., Langford, J. & Slivkins, A.. (2014). Resourceful Contextual Bandits. Proceedings of The 27th Conference on Learning Theory, in PMLR 35:1109-1134

Related Material