Contextual Bandits with Linear Payoff Functions

Wei Chu; Lihong Li; Lev Reyzin; Robert Schapire

Contextual Bandits with Linear Payoff Functions

Wei Chu, Lihong Li, Lev Reyzin, Robert Schapire

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:208-214, 2011.

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.

Cite this Paper

BibTeX


@InProceedings{pmlr-v15-chu11a,
  title = 	 {Contextual Bandits with Linear Payoff Functions},
  author = 	 {Chu, Wei and Li, Lihong and Reyzin, Lev and Schapire, Robert},
  booktitle = 	 {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {208--214},
  year = 	 {2011},
  editor = 	 {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  volume = 	 {15},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Fort Lauderdale, FL, USA},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v15/chu11a/chu11a.pdf},
  url = 	 {https://proceedings.mlr.press/v15/chu11a.html},
  abstract = 	 {In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions.  For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem.  We also prove a lower bound  of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.}
}

Endnote

%0 Conference Paper
%T Contextual Bandits with Linear Payoff Functions
%A Wei Chu
%A Lihong Li
%A Lev Reyzin
%A Robert Schapire
%B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2011
%E Geoffrey Gordon
%E David Dunson
%E Miroslav Dudík	
%F pmlr-v15-chu11a
%I PMLR
%P 208--214
%U https://proceedings.mlr.press/v15/chu11a.html
%V 15
%X In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions.  For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem.  We also prove a lower bound  of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.

RIS


TY  - CPAPER
TI  - Contextual Bandits with Linear Payoff Functions
AU  - Wei Chu
AU  - Lihong Li
AU  - Lev Reyzin
AU  - Robert Schapire
BT  - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
DA  - 2011/06/14
ED  - Geoffrey Gordon
ED  - David Dunson
ED  - Miroslav Dudík	
ID  - pmlr-v15-chu11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 15
SP  - 208
EP  - 214
L1  - http://proceedings.mlr.press/v15/chu11a/chu11a.pdf
UR  - https://proceedings.mlr.press/v15/chu11a.html
AB  - In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions.  For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem.  We also prove a lower bound  of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.
ER  -

APA


Chu, W., Li, L., Reyzin, L. & Schapire, R.. (2011). Contextual Bandits with Linear Payoff Functions. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:208-214 Available from https://proceedings.mlr.press/v15/chu11a.html.

Related Material

Download PDF