Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Runzhe Wan, Lin Ge, Rui Song
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:1144-1173, 2023.

Abstract

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a wide class of structured bandit problems where the parameter space can be factorized to item-level, which covers many popular tasks. Compared with existing approaches, the proposed solution is both scalable to large systems and robust by utilizing a more flexible model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Theoretical analysis and extensive numerical results both support the usefulness of the proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-wan23a, title = {Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework}, author = {Wan, Runzhe and Ge, Lin and Song, Rui}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {1144--1173}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/wan23a/wan23a.pdf}, url = {https://proceedings.mlr.press/v206/wan23a.html}, abstract = {Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a wide class of structured bandit problems where the parameter space can be factorized to item-level, which covers many popular tasks. Compared with existing approaches, the proposed solution is both scalable to large systems and robust by utilizing a more flexible model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Theoretical analysis and extensive numerical results both support the usefulness of the proposed method.} }
Endnote
%0 Conference Paper %T Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework %A Runzhe Wan %A Lin Ge %A Rui Song %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-wan23a %I PMLR %P 1144--1173 %U https://proceedings.mlr.press/v206/wan23a.html %V 206 %X Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a wide class of structured bandit problems where the parameter space can be factorized to item-level, which covers many popular tasks. Compared with existing approaches, the proposed solution is both scalable to large systems and robust by utilizing a more flexible model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Theoretical analysis and extensive numerical results both support the usefulness of the proposed method.
APA
Wan, R., Ge, L. & Song, R.. (2023). Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:1144-1173 Available from https://proceedings.mlr.press/v206/wan23a.html.

Related Material