Thompson Sampling with a Mixture Prior

Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7565-7586, 2022.

Abstract

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and general proof technique for analyzing the concentration of mixture distributions. We use it to derive Bayes regret bounds for MixTS in both linear bandits and finite-horizon reinforcement learning (RL). Our regret bounds reflect the structure of the mixture prior, and depend on the number of mixture components and their width. We demonstrate the empirical effectiveness of MixTS in synthetic and real-world experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-hong22b, title = { Thompson Sampling with a Mixture Prior }, author = {Hong, Joey and Kveton, Branislav and Zaheer, Manzil and Ghavamzadeh, Mohammad and Boutilier, Craig}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {7565--7586}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/hong22b/hong22b.pdf}, url = {https://proceedings.mlr.press/v151/hong22b.html}, abstract = { We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and general proof technique for analyzing the concentration of mixture distributions. We use it to derive Bayes regret bounds for MixTS in both linear bandits and finite-horizon reinforcement learning (RL). Our regret bounds reflect the structure of the mixture prior, and depend on the number of mixture components and their width. We demonstrate the empirical effectiveness of MixTS in synthetic and real-world experiments. } }
Endnote
%0 Conference Paper %T Thompson Sampling with a Mixture Prior %A Joey Hong %A Branislav Kveton %A Manzil Zaheer %A Mohammad Ghavamzadeh %A Craig Boutilier %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-hong22b %I PMLR %P 7565--7586 %U https://proceedings.mlr.press/v151/hong22b.html %V 151 %X We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and general proof technique for analyzing the concentration of mixture distributions. We use it to derive Bayes regret bounds for MixTS in both linear bandits and finite-horizon reinforcement learning (RL). Our regret bounds reflect the structure of the mixture prior, and depend on the number of mixture components and their width. We demonstrate the empirical effectiveness of MixTS in synthetic and real-world experiments.
APA
Hong, J., Kveton, B., Zaheer, M., Ghavamzadeh, M. & Boutilier, C.. (2022). Thompson Sampling with a Mixture Prior . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:7565-7586 Available from https://proceedings.mlr.press/v151/hong22b.html.

Related Material