[edit]
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:1144-1173, 2023.
Abstract
Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a wide class of structured bandit problems where the parameter space can be factorized to item-level, which covers many popular tasks. Compared with existing approaches, the proposed solution is both scalable to large systems and robust by utilizing a more flexible model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Theoretical analysis and extensive numerical results both support the usefulness of the proposed method.