Contextual Bandits with Linear Payoff Functions

Wei Chu, Lihong Li, Lev Reyzin, Robert Schapire
; Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings 15:208-214, 2011.

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf]

Cite this Paper


BibTeX
@InProceedings{pmlr-v15-chu11a, title = {Contextual Bandits with Linear Payoff Functions}, author = {Wei Chu and Lihong Li and Lev Reyzin and Robert Schapire}, booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics}, pages = {208--214}, year = {2011}, editor = {Geoffrey Gordon and David Dunson and Miroslav Dudík}, volume = {15}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {11--13 Apr}, publisher = {JMLR Workshop and Conference Proceedings}, pdf = {http://proceedings.mlr.press/v15/chu11a/chu11a.pdf}, url = {http://proceedings.mlr.press/v15/chu11a.html}, abstract = {In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf]} }
Endnote
%0 Conference Paper %T Contextual Bandits with Linear Payoff Functions %A Wei Chu %A Lihong Li %A Lev Reyzin %A Robert Schapire %B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2011 %E Geoffrey Gordon %E David Dunson %E Miroslav Dudík %F pmlr-v15-chu11a %I PMLR %J Proceedings of Machine Learning Research %P 208--214 %U http://proceedings.mlr.press %V 15 %W PMLR %X In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf]
RIS
TY - CPAPER TI - Contextual Bandits with Linear Payoff Functions AU - Wei Chu AU - Lihong Li AU - Lev Reyzin AU - Robert Schapire BT - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics PY - 2011/06/14 DA - 2011/06/14 ED - Geoffrey Gordon ED - David Dunson ED - Miroslav Dudík ID - pmlr-v15-chu11a PB - PMLR SP - 208 DP - PMLR EP - 214 L1 - http://proceedings.mlr.press/v15/chu11a/chu11a.pdf UR - http://proceedings.mlr.press/v15/chu11a.html AB - In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf] ER -
APA
Chu, W., Li, L., Reyzin, L. & Schapire, R.. (2011). Contextual Bandits with Linear Payoff Functions. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in PMLR 15:208-214

Related Material