Contextual Bandits with Linear Payoff Functions

Wei Chu, Lihong Li, Lev Reyzin, Robert Schapire
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:208-214, 2011.

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.

Cite this Paper


BibTeX
@InProceedings{pmlr-v15-chu11a, title = {Contextual Bandits with Linear Payoff Functions}, author = {Chu, Wei and Li, Lihong and Reyzin, Lev and Schapire, Robert}, booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics}, pages = {208--214}, year = {2011}, editor = {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav}, volume = {15}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v15/chu11a/chu11a.pdf}, url = {https://proceedings.mlr.press/v15/chu11a.html}, abstract = {In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.} }
Endnote
%0 Conference Paper %T Contextual Bandits with Linear Payoff Functions %A Wei Chu %A Lihong Li %A Lev Reyzin %A Robert Schapire %B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2011 %E Geoffrey Gordon %E David Dunson %E Miroslav Dudík %F pmlr-v15-chu11a %I PMLR %P 208--214 %U https://proceedings.mlr.press/v15/chu11a.html %V 15 %X In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors.
RIS
TY - CPAPER TI - Contextual Bandits with Linear Payoff Functions AU - Wei Chu AU - Lihong Li AU - Lev Reyzin AU - Robert Schapire BT - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics DA - 2011/06/14 ED - Geoffrey Gordon ED - David Dunson ED - Miroslav Dudík ID - pmlr-v15-chu11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 15 SP - 208 EP - 214 L1 - http://proceedings.mlr.press/v15/chu11a/chu11a.pdf UR - https://proceedings.mlr.press/v15/chu11a.html AB - In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For $T$ rounds, $K$ actions, and d dimensional feature vectors, we prove an $O\left(\sqrt{Td\ln^3(KT\ln(T)/\delta)}\right)$ regret bound that holds with probability $1-\delta$ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of $\Omega(\sqrt{Td})$ for this setting, matching the upper bound up to logarithmic factors. ER -
APA
Chu, W., Li, L., Reyzin, L. & Schapire, R.. (2011). Contextual Bandits with Linear Payoff Functions. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:208-214 Available from https://proceedings.mlr.press/v15/chu11a.html.

Related Material