Resourceful Contextual Bandits

Ashwinkumar Badanidiyuru; John Langford; Aleksandrs Slivkins

Resourceful Contextual Bandits

Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

Proceedings of The 27th Conference on Learning Theory, PMLR 35:1109-1134, 2014.

Abstract

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks, Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-badanidiyuru14,
  title = 	 {Resourceful Contextual Bandits},
  author = 	 {Badanidiyuru, Ashwinkumar and Langford, John and Slivkins, Aleksandrs},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {1109--1134},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/badanidiyuru14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/badanidiyuru14.html},
  abstract = 	 {We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.  We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks,  Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.}
}

Endnote

%0 Conference Paper
%T Resourceful Contextual Bandits
%A Ashwinkumar Badanidiyuru
%A John Langford
%A Aleksandrs Slivkins
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-badanidiyuru14
%I PMLR
%P 1109--1134
%U https://proceedings.mlr.press/v35/badanidiyuru14.html
%V 35
%X We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.  We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks,  Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.

RIS


TY  - CPAPER
TI  - Resourceful Contextual Bandits
AU  - Ashwinkumar Badanidiyuru
AU  - John Langford
AU  - Aleksandrs Slivkins
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-badanidiyuru14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 1109
EP  - 1134
L1  - http://proceedings.mlr.press/v35/badanidiyuru14.pdf
UR  - https://proceedings.mlr.press/v35/badanidiyuru14.html
AB  - We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.  We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and bandits with resource constraints (bandits with knapsacks,  Badanidiyuru et al. (2013a)), and prove a regret guarantee with near-optimal statistical properties.
ER  -

APA


Badanidiyuru, A., Langford, J. & Slivkins, A.. (2014). Resourceful Contextual Bandits. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:1109-1134 Available from https://proceedings.mlr.press/v35/badanidiyuru14.html.

Related Material

Download PDF