Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Julian Zimmert, Haipeng Luo, Chen-Yu Wei
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7683-7692, 2019.

Abstract

We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-zimmert19a, title = {Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously}, author = {Zimmert, Julian and Luo, Haipeng and Wei, Chen-Yu}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7683--7692}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/zimmert19a/zimmert19a.pdf}, url = {https://proceedings.mlr.press/v97/zimmert19a.html}, abstract = {We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.} }
Endnote
%0 Conference Paper %T Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously %A Julian Zimmert %A Haipeng Luo %A Chen-Yu Wei %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-zimmert19a %I PMLR %P 7683--7692 %U https://proceedings.mlr.press/v97/zimmert19a.html %V 97 %X We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.
APA
Zimmert, J., Luo, H. & Wei, C.. (2019). Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7683-7692 Available from https://proceedings.mlr.press/v97/zimmert19a.html.

Related Material