One Practical Algorithm for Both Stochastic and Adversarial Bandits

Yevgeny Seldin; Aleksandrs Slivkins

One Practical Algorithm for Both Stochastic and Adversarial Bandits

Yevgeny Seldin, Aleksandrs Slivkins

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1287-1295, 2014.

Abstract

We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-seldinb14,
  title = 	 {One Practical Algorithm for Both Stochastic and Adversarial Bandits},
  author = 	 {Seldin, Yevgeny and Slivkins, Aleksandrs},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1287--1295},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/seldinb14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/seldinb14.html},
  abstract = 	 {We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.}
}

Endnote

%0 Conference Paper
%T One Practical Algorithm for Both Stochastic and Adversarial Bandits
%A Yevgeny Seldin
%A Aleksandrs Slivkins
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-seldinb14
%I PMLR
%P 1287--1295
%U https://proceedings.mlr.press/v32/seldinb14.html
%V 32
%N 2
%X We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.

RIS


TY  - CPAPER
TI  - One Practical Algorithm for Both Stochastic and Adversarial Bandits
AU  - Yevgeny Seldin
AU  - Aleksandrs Slivkins
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-seldinb14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1287
EP  - 1295
L1  - http://proceedings.mlr.press/v32/seldinb14.pdf
UR  - https://proceedings.mlr.press/v32/seldinb14.html
AB  - We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies the “old” control lever, the learning rate, to control the regret in the adversarial regime and the new control lever to detect and exploit gaps between the arm losses. This secures problem-dependent “logarithmic” regret when gaps are present without compromising on the worst-case performance guarantee in the adversarial regime. We show that the algorithm can exploit both the usual expected gaps between the arm losses in the stochastic regime and deterministic gaps between the arm losses in the adversarial regime. The algorithm retains “logarithmic” regret guarantee in the stochastic regime even when some observations are contaminated by an adversary, as long as on average the contamination does not reduce the gap by more than a half. Our results for the stochastic regime are supported by experimental validation.
ER  -

APA


Seldin, Y. & Slivkins, A.. (2014). One Practical Algorithm for Both Stochastic and Adversarial Bandits. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1287-1295 Available from https://proceedings.mlr.press/v32/seldinb14.html.

One Practical Algorithm for Both Stochastic and Adversarial Bandits

Abstract

Cite this Paper

Related Material