# Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

[edit]

Alekh Agarwal,
Daniel Hsu,
Satyen Kale,
John Langford,
Lihong Li,
Robert Schapire
;

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1638-1646, 2014.

#### Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.

#### Related Material