Thompson Sampling in Switching Environments with Bayesian Online Change Detection

Joseph Mellor; Jonathan Shapiro

Thompson Sampling in Switching Environments with Bayesian Online Change Detection

Joseph Mellor, Jonathan Shapiro

Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:442-450, 2013.

Abstract

Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS). We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.

Cite this Paper

BibTeX


@InProceedings{pmlr-v31-mellor13a,
  title = 	 {Thompson Sampling in Switching Environments with Bayesian Online Change Detection},
  author = 	 {Mellor, Joseph and Shapiro, Jonathan},
  booktitle = 	 {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {442--450},
  year = 	 {2013},
  editor = 	 {Carvalho, Carlos M. and Ravikumar, Pradeep},
  volume = 	 {31},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Scottsdale, Arizona, USA},
  month = 	 {29 Apr--01 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v31/mellor13a.pdf},
  url = 	 {https://proceedings.mlr.press/v31/mellor13a.html},
  abstract = 	 {Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS).  We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.  }
}

Endnote

%0 Conference Paper
%T Thompson Sampling in Switching Environments with Bayesian Online Change Detection
%A Joseph Mellor
%A Jonathan Shapiro
%B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2013
%E Carlos M. Carvalho
%E Pradeep Ravikumar	
%F pmlr-v31-mellor13a
%I PMLR
%P 442--450
%U https://proceedings.mlr.press/v31/mellor13a.html
%V 31
%X Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS).  We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.

RIS


TY  - CPAPER
TI  - Thompson Sampling in Switching Environments with Bayesian Online Change Detection
AU  - Joseph Mellor
AU  - Jonathan Shapiro
BT  - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
DA  - 2013/04/29
ED  - Carlos M. Carvalho
ED  - Pradeep Ravikumar	
ID  - pmlr-v31-mellor13a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 31
SP  - 442
EP  - 450
L1  - http://proceedings.mlr.press/v31/mellor13a.pdf
UR  - https://proceedings.mlr.press/v31/mellor13a.html
AB  - Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propose a Thompson Sampling strategy equipped with a Bayesian change point mechanism to tackle this problem. We develop algorithms for a variety of cases with constant switching rate: when switching occurs all arms change (Global Switching), switching occurs independently for each arm (Per-Arm Switching), when the switching rate is known and when it must be inferred from data. This leads to a family of algorithms we collectively term Change-Point Thompson Sampling (CTS).  We show empirical results in 4 artificial environments, and 2 derived from real world data: news click-through and foreign exchange data, comparing them to some other bandit algorithms. In real world data CTS is the most effective.  
ER  -

APA


Mellor, J. & Shapiro, J.. (2013). Thompson Sampling in Switching Environments with Bayesian Online Change Detection. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:442-450 Available from https://proceedings.mlr.press/v31/mellor13a.html.

Related Material

Download PDF