# Contextual Bandits with Linear Payoff Functions

[edit]

Wei Chu,
Lihong Li,
Lev Reyzin,
Robert Schapire
;

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:208-214, 2011.

#### Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(\sqrtTd\ln^3(KT\ln(T)/δ)) regret bound that holds with probability 1-δfor the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω(\sqrtTd) for this setting, matching the upper bound up to logarithmic factors. [pdf]

#### Related Material