Reactive bandits with attitude

Pedro Ortega; Kee-Eung Kim; Daniel Lee

Reactive bandits with attitude

Pedro Ortega, Kee-Eung Kim, Daniel Lee

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:726-734, 2015.

Abstract

We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the “attitude” of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling. Results show the qualitative difference in policy pseudo-regret between our proposed strategy and other well-known bandit algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-ortega15,
  title = 	 {{Reactive bandits with attitude}},
  author = 	 {Ortega, Pedro and Kim, Kee-Eung and Lee, Daniel},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {726--734},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/ortega15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/ortega15.html},
  abstract = 	 {We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the “attitude” of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling.  Results show the qualitative difference in policy pseudo-regret between our proposed strategy and other well-known bandit algorithms.}
}

Endnote

%0 Conference Paper
%T Reactive bandits with attitude
%A Pedro Ortega
%A Kee-Eung Kim
%A Daniel Lee
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-ortega15
%I PMLR
%P 726--734
%U https://proceedings.mlr.press/v38/ortega15.html
%V 38
%X We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the “attitude” of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling.  Results show the qualitative difference in policy pseudo-regret between our proposed strategy and other well-known bandit algorithms.

RIS


TY  - CPAPER
TI  - Reactive bandits with attitude
AU  - Pedro Ortega
AU  - Kee-Eung Kim
AU  - Daniel Lee
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-ortega15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 726
EP  - 734
L1  - http://proceedings.mlr.press/v38/ortega15.pdf
UR  - https://proceedings.mlr.press/v38/ortega15.html
AB  - We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the “attitude” of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling.  Results show the qualitative difference in policy pseudo-regret between our proposed strategy and other well-known bandit algorithms.
ER  -

APA


Ortega, P., Kim, K. & Lee, D.. (2015). Reactive bandits with attitude. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:726-734 Available from https://proceedings.mlr.press/v38/ortega15.html.

Related Material

Download PDF