On Kernelized Multiarmed Bandits
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:844853, 2017.
Abstract
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian processbased algorithms for continuous bandit optimization – Improved GPUCB (IGPUCB) and GPThomson sampling (GPTS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new selfnormalized concentration inequality for vectorvalued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and realworld environments are carried out that highlight the favourable gains of the proposed strategies in many cases.
Related Material


