A Thompson Sampling Algorithm for Cascading Bandits

Wang Chi Cheung; Vincent Tan; Zixin Zhong

A Thompson Sampling Algorithm for Cascading Bandits

Wang Chi Cheung, Vincent Tan, Zixin Zhong

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:438-447, 2019.

Abstract

We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-ã-vis existing UCB-based approaches. We also incorporate the empirical variance of each item’s click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form

$\tilde{O}(\sqrt{KLT})$ where

$L$ and

$K$ are the number of ground items and the number of items in the chosen list respectively and

$T\ge L$ is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. More importantly, it is the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback. Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-cheung19a,
  title = 	 {A Thompson Sampling Algorithm for Cascading Bandits},
  author =       {Cheung, Wang Chi and Tan, Vincent and Zhong, Zixin},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {438--447},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/cheung19a/cheung19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/cheung19a.html},
  abstract = 	 {We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-ã-vis existing UCB-based approaches. We also incorporate the empirical variance of each item’s click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form $\tilde{O}(\sqrt{KLT})$ where $L$ and $K$ are the number of ground items and the number of items in the chosen list respectively and $T\ge L$ is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. More importantly, it is the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback.  Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity.}
}

Endnote

%0 Conference Paper
%T A Thompson Sampling Algorithm for Cascading Bandits
%A Wang Chi Cheung
%A Vincent Tan
%A Zixin Zhong
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-cheung19a
%I PMLR
%P 438--447
%U https://proceedings.mlr.press/v89/cheung19a.html
%V 89
%X We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-ã-vis existing UCB-based approaches. We also incorporate the empirical variance of each item’s click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form $\tilde{O}(\sqrt{KLT})$ where $L$ and $K$ are the number of ground items and the number of items in the chosen list respectively and $T\ge L$ is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. More importantly, it is the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback.  Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity.

APA


Cheung, W.C., Tan, V. & Zhong, Z.. (2019). A Thompson Sampling Algorithm for Cascading Bandits. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:438-447 Available from https://proceedings.mlr.press/v89/cheung19a.html.

A Thompson Sampling Algorithm for Cascading Bandits

Abstract

Cite this Paper

Related Material