Graphical Models Meet Bandits: A Variational Thompson Sampling Approach

Tong Yu; Branislav Kveton; Zheng Wen; Ruiyi Zhang; Ole J. Mengshoel

Graphical Models Meet Bandits: A Variational Thompson Sampling Approach

Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10902-10912, 2020.

Abstract

We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework uses a graphical model to capture complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-yu20b,
  title = 	 {Graphical Models Meet Bandits: A Variational Thompson Sampling Approach},
  author =       {Yu, Tong and Kveton, Branislav and Wen, Zheng and Zhang, Ruiyi and Mengshoel, Ole J.},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {10902--10912},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/yu20b/yu20b.pdf},
  url = 	 {https://proceedings.mlr.press/v119/yu20b.html},
  abstract = 	 {We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework uses a graphical model to capture complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.}
}

Endnote

%0 Conference Paper
%T Graphical Models Meet Bandits: A Variational Thompson Sampling Approach
%A Tong Yu
%A Branislav Kveton
%A Zheng Wen
%A Ruiyi Zhang
%A Ole J. Mengshoel
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-yu20b
%I PMLR
%P 10902--10912
%U https://proceedings.mlr.press/v119/yu20b.html
%V 119
%X We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework uses a graphical model to capture complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.

APA

Yu, T., Kveton, B., Wen, Z., Zhang, R. & Mengshoel, O.J.. (2020). Graphical Models Meet Bandits: A Variational Thompson Sampling Approach. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10902-10912 Available from https://proceedings.mlr.press/v119/yu20b.html.

Graphical Models Meet Bandits: A Variational Thompson Sampling Approach

Abstract

Cite this Paper

Related Material