On Kernelized Multi-armed Bandits

Sayak Ray Chowdhury, Aditya Gopalan
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:844-853, 2017.

Abstract

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization – Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector-valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favourable gains of the proposed strategies in many cases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-chowdhury17a, title = {On Kernelized Multi-armed Bandits}, author = {Sayak Ray Chowdhury and Aditya Gopalan}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {844--853}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/chowdhury17a/chowdhury17a.pdf}, url = {https://proceedings.mlr.press/v70/chowdhury17a.html}, abstract = {We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization – Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector-valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favourable gains of the proposed strategies in many cases.} }
Endnote
%0 Conference Paper %T On Kernelized Multi-armed Bandits %A Sayak Ray Chowdhury %A Aditya Gopalan %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-chowdhury17a %I PMLR %P 844--853 %U https://proceedings.mlr.press/v70/chowdhury17a.html %V 70 %X We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization – Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector-valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favourable gains of the proposed strategies in many cases.
APA
Chowdhury, S.R. & Gopalan, A.. (2017). On Kernelized Multi-armed Bandits. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:844-853 Available from https://proceedings.mlr.press/v70/chowdhury17a.html.

Related Material