Mixed-Effect Thompson Sampling

Imad Aouali, Branislav Kveton, Sumeet Katariya
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2087-2115, 2023.

Abstract

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we propose Mixed-Effect Thompson Sampling (meTS) and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform well empirically and show the generality of the proposed framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-aouali23a, title = {Mixed-Effect Thompson Sampling}, author = {Aouali, Imad and Kveton, Branislav and Katariya, Sumeet}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {2087--2115}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/aouali23a/aouali23a.pdf}, url = {https://proceedings.mlr.press/v206/aouali23a.html}, abstract = {A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we propose Mixed-Effect Thompson Sampling (meTS) and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform well empirically and show the generality of the proposed framework.} }
Endnote
%0 Conference Paper %T Mixed-Effect Thompson Sampling %A Imad Aouali %A Branislav Kveton %A Sumeet Katariya %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-aouali23a %I PMLR %P 2087--2115 %U https://proceedings.mlr.press/v206/aouali23a.html %V 206 %X A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we propose Mixed-Effect Thompson Sampling (meTS) and bound its Bayes regret. The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters. The terms reflect the structure of our model and the quality of priors. Our theoretical findings are validated empirically using both synthetic and real-world problems. We also propose numerous extensions of practical interest. While they do not come with guarantees, they perform well empirically and show the generality of the proposed framework.
APA
Aouali, I., Kveton, B. & Katariya, S.. (2023). Mixed-Effect Thompson Sampling. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2087-2115 Available from https://proceedings.mlr.press/v206/aouali23a.html.

Related Material