Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1638-1646, 2014.

Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-agarwalb14, title = {Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits}, author = {Agarwal, Alekh and Hsu, Daniel and Kale, Satyen and Langford, John and Li, Lihong and Schapire, Robert}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {1638--1646}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/agarwalb14.pdf}, url = {https://proceedings.mlr.press/v32/agarwalb14.html}, abstract = {We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.} }
Endnote
%0 Conference Paper %T Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits %A Alekh Agarwal %A Daniel Hsu %A Satyen Kale %A John Langford %A Lihong Li %A Robert Schapire %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-agarwalb14 %I PMLR %P 1638--1646 %U https://proceedings.mlr.press/v32/agarwalb14.html %V 32 %N 2 %X We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.
RIS
TY - CPAPER TI - Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits AU - Alekh Agarwal AU - Daniel Hsu AU - Satyen Kale AU - John Langford AU - Lihong Li AU - Robert Schapire BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-agarwalb14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 1638 EP - 1646 L1 - http://proceedings.mlr.press/v32/agarwalb14.pdf UR - https://proceedings.mlr.press/v32/agarwalb14.html AB - We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines. ER -
APA
Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L. & Schapire, R.. (2014). Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1638-1646 Available from https://proceedings.mlr.press/v32/agarwalb14.html.

Related Material