Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments

Yevgeny Seldin, Csaba Szepesvári, Peter Auer, Yasin Abbasi-Yadkori
Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:103-116, 2013.

Abstract

EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with side information [Seldin et al., 2011] and to multiarmed bandits in the mixed stochastic-adversarial setting [Bubeck and Slivkins, 2012]. We present an empirical evaluation and improved analysis of the performance of the EXP3 algorithm in the stochastic setting, as well as a modification of the EXP3 algorithm capable of achieving “logarithmic” regret in stochastic environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v24-seldin12a, title = {Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments}, author = {Seldin, Yevgeny and Szepesvári, Csaba and Auer, Peter and Abbasi-Yadkori, Yasin}, booktitle = {Proceedings of the Tenth European Workshop on Reinforcement Learning}, pages = {103--116}, year = {2013}, editor = {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan}, volume = {24}, series = {Proceedings of Machine Learning Research}, address = {Edinburgh, Scotland}, month = {30 Jun--01 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v24/seldin12a/seldin12a.pdf}, url = {https://proceedings.mlr.press/v24/seldin12a.html}, abstract = {EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with side information [Seldin et al., 2011] and to multiarmed bandits in the mixed stochastic-adversarial setting [Bubeck and Slivkins, 2012]. We present an empirical evaluation and improved analysis of the performance of the EXP3 algorithm in the stochastic setting, as well as a modification of the EXP3 algorithm capable of achieving “logarithmic” regret in stochastic environments.} }
Endnote
%0 Conference Paper %T Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments %A Yevgeny Seldin %A Csaba Szepesvári %A Peter Auer %A Yasin Abbasi-Yadkori %B Proceedings of the Tenth European Workshop on Reinforcement Learning %C Proceedings of Machine Learning Research %D 2013 %E Marc Peter Deisenroth %E Csaba Szepesvári %E Jan Peters %F pmlr-v24-seldin12a %I PMLR %P 103--116 %U https://proceedings.mlr.press/v24/seldin12a.html %V 24 %X EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with side information [Seldin et al., 2011] and to multiarmed bandits in the mixed stochastic-adversarial setting [Bubeck and Slivkins, 2012]. We present an empirical evaluation and improved analysis of the performance of the EXP3 algorithm in the stochastic setting, as well as a modification of the EXP3 algorithm capable of achieving “logarithmic” regret in stochastic environments.
RIS
TY - CPAPER TI - Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments AU - Yevgeny Seldin AU - Csaba Szepesvári AU - Peter Auer AU - Yasin Abbasi-Yadkori BT - Proceedings of the Tenth European Workshop on Reinforcement Learning DA - 2013/01/12 ED - Marc Peter Deisenroth ED - Csaba Szepesvári ED - Jan Peters ID - pmlr-v24-seldin12a PB - PMLR DP - Proceedings of Machine Learning Research VL - 24 SP - 103 EP - 116 L1 - http://proceedings.mlr.press/v24/seldin12a/seldin12a.pdf UR - https://proceedings.mlr.press/v24/seldin12a.html AB - EXP3 is a popular algorithm for adversarial multiarmed bandits, suggested and analyzed in this setting by Auer et al. [2002b]. Recently there was an increased interest in the performance of this algorithm in the stochastic setting, due to its new applications to stochastic multiarmed bandits with side information [Seldin et al., 2011] and to multiarmed bandits in the mixed stochastic-adversarial setting [Bubeck and Slivkins, 2012]. We present an empirical evaluation and improved analysis of the performance of the EXP3 algorithm in the stochastic setting, as well as a modification of the EXP3 algorithm capable of achieving “logarithmic” regret in stochastic environments. ER -
APA
Seldin, Y., Szepesvári, C., Auer, P. & Abbasi-Yadkori, Y.. (2013). Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:103-116 Available from https://proceedings.mlr.press/v24/seldin12a.html.

Related Material