On Approximate Thompson Sampling with Langevin Algorithms

Eric Mazumdar; Aldo Pacchiano; Yian Ma; Michael Jordan; Peter Bartlett

On Approximate Thompson Sampling with Langevin Algorithms

Eric Mazumdar, Aldo Pacchiano, Yian Ma, Michael Jordan, Peter Bartlett

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6797-6807, 2020.

Abstract

Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, its wider deployment is restricted due to a significant computational limitation: the need for samples from posterior distributions at every iteration. In practice, this limitation is alleviated by making use of approximate sampling methods, yet provably incorporating approximate samples into Thompson Sampling algorithms remains an open problem. In this work we address this by proposing two efficient Langevin MCMC algorithms tailored to Thompson sampling. The resulting approximate Thompson Sampling algorithms are efficiently implementable and provably achieve optimal instance-dependent regret for the Multi-Armed Bandit (MAB) problem. To prove these results we derive novel posterior concentration bounds and MCMC convergence rates for log-concave distributions which may be of independent interest.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-mazumdar20a,
  title = 	 {On Approximate Thompson Sampling with {L}angevin Algorithms},
  author =       {Mazumdar, Eric and Pacchiano, Aldo and Ma, Yian and Jordan, Michael and Bartlett, Peter},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {6797--6807},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/mazumdar20a/mazumdar20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/mazumdar20a.html},
  abstract = 	 {Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, its wider deployment is restricted due to a significant computational limitation: the need for samples from posterior distributions at every iteration. In practice, this limitation is alleviated by making use of approximate sampling methods, yet provably incorporating approximate samples into Thompson Sampling algorithms remains an open problem. In this work we address this by proposing two efficient Langevin MCMC algorithms tailored to Thompson sampling. The resulting approximate Thompson Sampling algorithms are efficiently implementable and provably achieve optimal instance-dependent regret for the Multi-Armed Bandit (MAB) problem. To prove these results we derive novel posterior concentration bounds and MCMC convergence rates for log-concave distributions which may be of independent interest.}
}

Endnote

%0 Conference Paper
%T On Approximate Thompson Sampling with Langevin Algorithms
%A Eric Mazumdar
%A Aldo Pacchiano
%A Yian Ma
%A Michael Jordan
%A Peter Bartlett
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-mazumdar20a
%I PMLR
%P 6797--6807
%U https://proceedings.mlr.press/v119/mazumdar20a.html
%V 119
%X Thompson sampling for multi-armed bandit problems is known to enjoy favorable performance in both theory and practice. However, its wider deployment is restricted due to a significant computational limitation: the need for samples from posterior distributions at every iteration. In practice, this limitation is alleviated by making use of approximate sampling methods, yet provably incorporating approximate samples into Thompson Sampling algorithms remains an open problem. In this work we address this by proposing two efficient Langevin MCMC algorithms tailored to Thompson sampling. The resulting approximate Thompson Sampling algorithms are efficiently implementable and provably achieve optimal instance-dependent regret for the Multi-Armed Bandit (MAB) problem. To prove these results we derive novel posterior concentration bounds and MCMC convergence rates for log-concave distributions which may be of independent interest.

APA

Mazumdar, E., Pacchiano, A., Ma, Y., Jordan, M. & Bartlett, P.. (2020). On Approximate Thompson Sampling with Langevin Algorithms. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6797-6807 Available from https://proceedings.mlr.press/v119/mazumdar20a.html.

On Approximate Thompson Sampling with Langevin Algorithms

Abstract

Cite this Paper

Related Material