Efficient and robust algorithms for adversarial linear contextual bandits

Gergely Neu; Julia Olkhovskaya

Efficient and robust algorithms for adversarial linear contextual bandits

Gergely Neu, Julia Olkhovskaya

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:3049-3068, 2020.

Abstract

We consider an adversarial variant of the classic

$K$ -armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the

$d$ -dimensional contexts are generated i.i.d. at random from a known distribution, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of

$\widetilde{O}(\sqrt{KdT})$ over

$T$ rounds, which matches the best known lower bound for this problem. Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of

$\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value by

$\varepsilon$ . To our knowledge, our performance guarantees constitute the very first results on this problem setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-neu20b,
  title = 	 {Efficient and robust algorithms for adversarial linear contextual bandits},
  author =       {Neu, Gergely and Olkhovskaya, Julia},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {3049--3068},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/neu20b/neu20b.pdf},
  url = 	 {https://proceedings.mlr.press/v125/neu20b.html},
  abstract = 	 { We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d. at random from a known distribution, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of $\widetilde{O}(\sqrt{KdT})$ over $T$ rounds, which matches the best known lower bound for this problem. Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value by $\varepsilon$. To our knowledge, our performance guarantees constitute the very first results on this problem setting.}
}

Endnote

%0 Conference Paper
%T Efficient and robust algorithms for adversarial linear contextual bandits
%A Gergely Neu
%A Julia Olkhovskaya
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-neu20b
%I PMLR
%P 3049--3068
%U https://proceedings.mlr.press/v125/neu20b.html
%V 125
%X  We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d. at random from a known distribution, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of $\widetilde{O}(\sqrt{KdT})$ over $T$ rounds, which matches the best known lower bound for this problem. Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value by $\varepsilon$. To our knowledge, our performance guarantees constitute the very first results on this problem setting.

APA


Neu, G. & Olkhovskaya, J.. (2020). Efficient and robust algorithms for adversarial linear contextual bandits. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:3049-3068 Available from https://proceedings.mlr.press/v125/neu20b.html.

Related Material

Download PDF