Thompson Sampling via Local Uncertainty

Zhendong Wang, Mingyuan Zhou
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10115-10125, 2020.

Abstract

Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to address the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandit benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-art performance while having low computational complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wang20ab, title = {Thompson Sampling via Local Uncertainty}, author = {Wang, Zhendong and Zhou, Mingyuan}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10115--10125}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wang20ab/wang20ab.pdf}, url = {https://proceedings.mlr.press/v119/wang20ab.html}, abstract = {Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to address the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandit benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-art performance while having low computational complexity.} }
Endnote
%0 Conference Paper %T Thompson Sampling via Local Uncertainty %A Zhendong Wang %A Mingyuan Zhou %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wang20ab %I PMLR %P 10115--10125 %U https://proceedings.mlr.press/v119/wang20ab.html %V 119 %X Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to address the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandit benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-art performance while having low computational complexity.
APA
Wang, Z. & Zhou, M.. (2020). Thompson Sampling via Local Uncertainty. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10115-10125 Available from https://proceedings.mlr.press/v119/wang20ab.html.

Related Material