Randomized Exploration in Generalized Linear Bandits

Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2066-2076, 2020.

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-kveton20a, title = {Randomized Exploration in Generalized Linear Bandits}, author = {Kveton, Branislav and Zaheer, Manzil and Szepesvari, Csaba and Li, Lihong and Ghavamzadeh, Mohammad and Boutilier, Craig}, pages = {2066--2076}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, address = {Online}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/kveton20a/kveton20a.pdf}, url = {http://proceedings.mlr.press/v108/kveton20a.html}, abstract = {We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.} }
Endnote
%0 Conference Paper %T Randomized Exploration in Generalized Linear Bandits %A Branislav Kveton %A Manzil Zaheer %A Csaba Szepesvari %A Lihong Li %A Mohammad Ghavamzadeh %A Craig Boutilier %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-kveton20a %I PMLR %J Proceedings of Machine Learning Research %P 2066--2076 %U http://proceedings.mlr.press %V 108 %W PMLR %X We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
APA
Kveton, B., Zaheer, M., Szepesvari, C., Li, L., Ghavamzadeh, M. & Boutilier, C.. (2020). Randomized Exploration in Generalized Linear Bandits. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in PMLR 108:2066-2076

Related Material