Randomized Exploration in Generalized Linear Bandits

Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2066-2076, 2020.

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-kveton20a, title = {Randomized Exploration in Generalized Linear Bandits}, author = {Kveton, Branislav and Zaheer, Manzil and Szepesvari, Csaba and Li, Lihong and Ghavamzadeh, Mohammad and Boutilier, Craig}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2066--2076}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/kveton20a/kveton20a.pdf}, url = {https://proceedings.mlr.press/v108/kveton20a.html}, abstract = {We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.} }
Endnote
%0 Conference Paper %T Randomized Exploration in Generalized Linear Bandits %A Branislav Kveton %A Manzil Zaheer %A Csaba Szepesvari %A Lihong Li %A Mohammad Ghavamzadeh %A Craig Boutilier %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-kveton20a %I PMLR %P 2066--2076 %U https://proceedings.mlr.press/v108/kveton20a.html %V 108 %X We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
APA
Kveton, B., Zaheer, M., Szepesvari, C., Li, L., Ghavamzadeh, M. & Boutilier, C.. (2020). Randomized Exploration in Generalized Linear Bandits. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2066-2076 Available from https://proceedings.mlr.press/v108/kveton20a.html.

Related Material