Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Alekh Agarwal; Daniel Hsu; Satyen Kale; John Langford; Lihong Li; Robert Schapire

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1638-1646, 2014.

Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-agarwalb14,
  title = 	 {Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits},
  author = 	 {Agarwal, Alekh and Hsu, Daniel and Kale, Satyen and Langford, John and Li, Lihong and Schapire, Robert},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1638--1646},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/agarwalb14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/agarwalb14.html},
  abstract = 	 {We present a new algorithm for the contextual bandit learning problem,  where the learner repeatedly takes one of K \emphactions in response to the  observed \emphcontext, and observes the \emphreward only for that  action. Our method assumes access to an oracle for solving fully  supervised cost-sensitive classification problems and achieves the  statistically optimal regret guarantee with only \otil(\sqrtKT)  oracle calls across all T rounds. By doing so, we obtain the most  practical contextual bandit learning algorithm amongst approaches that  work for general policy classes.  We conduct a  proof-of-concept experiment which demonstrates the excellent  computational and statistical performance of (an online variant of) our  algorithm relative to several strong baselines.}
}

Endnote

%0 Conference Paper
%T Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
%A Alekh Agarwal
%A Daniel Hsu
%A Satyen Kale
%A John Langford
%A Lihong Li
%A Robert Schapire
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-agarwalb14
%I PMLR
%P 1638--1646
%U https://proceedings.mlr.press/v32/agarwalb14.html
%V 32
%N 2
%X We present a new algorithm for the contextual bandit learning problem,  where the learner repeatedly takes one of K \emphactions in response to the  observed \emphcontext, and observes the \emphreward only for that  action. Our method assumes access to an oracle for solving fully  supervised cost-sensitive classification problems and achieves the  statistically optimal regret guarantee with only \otil(\sqrtKT)  oracle calls across all T rounds. By doing so, we obtain the most  practical contextual bandit learning algorithm amongst approaches that  work for general policy classes.  We conduct a  proof-of-concept experiment which demonstrates the excellent  computational and statistical performance of (an online variant of) our  algorithm relative to several strong baselines.

RIS


TY  - CPAPER
TI  - Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
AU  - Alekh Agarwal
AU  - Daniel Hsu
AU  - Satyen Kale
AU  - John Langford
AU  - Lihong Li
AU  - Robert Schapire
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-agarwalb14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1638
EP  - 1646
L1  - http://proceedings.mlr.press/v32/agarwalb14.pdf
UR  - https://proceedings.mlr.press/v32/agarwalb14.html
AB  - We present a new algorithm for the contextual bandit learning problem,  where the learner repeatedly takes one of K \emphactions in response to the  observed \emphcontext, and observes the \emphreward only for that  action. Our method assumes access to an oracle for solving fully  supervised cost-sensitive classification problems and achieves the  statistically optimal regret guarantee with only \otil(\sqrtKT)  oracle calls across all T rounds. By doing so, we obtain the most  practical contextual bandit learning algorithm amongst approaches that  work for general policy classes.  We conduct a  proof-of-concept experiment which demonstrates the excellent  computational and statistical performance of (an online variant of) our  algorithm relative to several strong baselines.
ER  -

APA


Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L. & Schapire, R.. (2014). Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1638-1646 Available from https://proceedings.mlr.press/v32/agarwalb14.html.

Related Material

Download PDF