Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

Sunrit Chakraborty, Saptarshi Roy, Ambuj Tewari
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:3979-4008, 2023.

Abstract

We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chakraborty23b, title = {Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits}, author = {Chakraborty, Sunrit and Roy, Saptarshi and Tewari, Ambuj}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {3979--4008}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chakraborty23b/chakraborty23b.pdf}, url = {https://proceedings.mlr.press/v202/chakraborty23b.html}, abstract = {We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.} }
Endnote
%0 Conference Paper %T Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits %A Sunrit Chakraborty %A Saptarshi Roy %A Ambuj Tewari %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chakraborty23b %I PMLR %P 3979--4008 %U https://proceedings.mlr.press/v202/chakraborty23b.html %V 202 %X We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.
APA
Chakraborty, S., Roy, S. & Tewari, A.. (2023). Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:3979-4008 Available from https://proceedings.mlr.press/v202/chakraborty23b.html.

Related Material