Perturbed-History Exploration in Stochastic Linear Bandits

Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:530-540, 2020.

Abstract

We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-kveton20a, title = {Perturbed-History Exploration in Stochastic Linear Bandits}, author = {Kveton, Branislav and Szepesv{\'{a}}ri, Csaba and Ghavamzadeh, Mohammad and Boutilier, Craig}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {530--540}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/kveton20a/kveton20a.pdf}, url = {https://proceedings.mlr.press/v115/kveton20a.html}, abstract = {We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.} }
Endnote
%0 Conference Paper %T Perturbed-History Exploration in Stochastic Linear Bandits %A Branislav Kveton %A Csaba Szepesvári %A Mohammad Ghavamzadeh %A Craig Boutilier %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-kveton20a %I PMLR %P 530--540 %U https://proceedings.mlr.press/v115/kveton20a.html %V 115 %X We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.
APA
Kveton, B., Szepesvári, C., Ghavamzadeh, M. & Boutilier, C.. (2020). Perturbed-History Exploration in Stochastic Linear Bandits. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:530-540 Available from https://proceedings.mlr.press/v115/kveton20a.html.

Related Material