Randomized Exploration in Generalized Linear Bandits
[edit]
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:20662076, 2020.
Abstract
We study two randomized algorithms for generalized linear bandits. The first, GLMTSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLMFPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in nonlinear models. We empirically evaluate both GLMTSL and GLMFPL in logistic bandits, and apply GLMFPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
Related Material


