Thompson Sampling in Switching Environments with Bayesian Online Change Detection

Joseph Mellor, Jonathan Shapiro
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:442-450, 2013.

Abstract

Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.

Cite this Paper


BibTeX
@InProceedings{pmlr-v31-mellor13a, title = {Thompson Sampling in Switching Environments with Bayesian Online Change Detection}, author = {Mellor, Joseph and Shapiro, Jonathan}, booktitle = {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics}, pages = {442--450}, year = {2013}, editor = {Carvalho, Carlos M. and Ravikumar, Pradeep}, volume = {31}, series = {Proceedings of Machine Learning Research}, address = {Scottsdale, Arizona, USA}, month = {29 Apr--01 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v31/mellor13a.pdf}, url = {https://proceedings.mlr.press/v31/mellor13a.html}, abstract = {Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective. } }
Endnote
%0 Conference Paper %T Thompson Sampling in Switching Environments with Bayesian Online Change Detection %A Joseph Mellor %A Jonathan Shapiro %B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2013 %E Carlos M. Carvalho %E Pradeep Ravikumar %F pmlr-v31-mellor13a %I PMLR %P 442--450 %U https://proceedings.mlr.press/v31/mellor13a.html %V 31 %X Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.
RIS
TY - CPAPER TI - Thompson Sampling in Switching Environments with Bayesian Online Change Detection AU - Joseph Mellor AU - Jonathan Shapiro BT - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics DA - 2013/04/29 ED - Carlos M. Carvalho ED - Pradeep Ravikumar ID - pmlr-v31-mellor13a PB - PMLR DP - Proceedings of Machine Learning Research VL - 31 SP - 442 EP - 450 L1 - http://proceedings.mlr.press/v31/mellor13a.pdf UR - https://proceedings.mlr.press/v31/mellor13a.html AB - Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective. ER -
APA
Mellor, J. & Shapiro, J.. (2013). Thompson Sampling in Switching Environments with Bayesian Online Change Detection. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:442-450 Available from https://proceedings.mlr.press/v31/mellor13a.html.

Related Material