Adversarial Contextual Bandits Go Kernelized

Gergely Neu, Julia Olkhovskaya, Sattar Vakili
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:907-929, 2024.

Abstract

We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\tilde O(KT^{\frac{1}{2}\pa{1+\frac{1}{c}}})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\tOO(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-neu24a, title = {Adversarial Contextual Bandits Go Kernelized}, author = {Neu, Gergely and Olkhovskaya, Julia and Vakili, Sattar}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {907--929}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/neu24a/neu24a.pdf}, url = {https://proceedings.mlr.press/v237/neu24a.html}, abstract = {We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\tilde O(KT^{\frac{1}{2}\pa{1+\frac{1}{c}}})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\tOO(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.} }
Endnote
%0 Conference Paper %T Adversarial Contextual Bandits Go Kernelized %A Gergely Neu %A Julia Olkhovskaya %A Sattar Vakili %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-neu24a %I PMLR %P 907--929 %U https://proceedings.mlr.press/v237/neu24a.html %V 237 %X We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\tilde O(KT^{\frac{1}{2}\pa{1+\frac{1}{c}}})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\tOO(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.
APA
Neu, G., Olkhovskaya, J. & Vakili, S.. (2024). Adversarial Contextual Bandits Go Kernelized. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:907-929 Available from https://proceedings.mlr.press/v237/neu24a.html.

Related Material