Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Julian Zimmert; Haipeng Luo; Chen-Yu Wei

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Julian Zimmert, Haipeng Luo, Chen-Yu Wei

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7683-7692, 2019.

Abstract

We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-zimmert19a,
  title = 	 {Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously},
  author =       {Zimmert, Julian and Luo, Haipeng and Wei, Chen-Yu},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {7683--7692},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/zimmert19a/zimmert19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/zimmert19a.html},
  abstract = 	 {We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.}
}

Endnote

%0 Conference Paper
%T Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
%A Julian Zimmert
%A Haipeng Luo
%A Chen-Yu Wei
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-zimmert19a
%I PMLR
%P 7683--7692
%U https://proceedings.mlr.press/v97/zimmert19a.html
%V 97
%X We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$. The leading problem-dependent constants of our bounds are not only optimal in some worst-case sense studied previously, but also optimal for two concrete instances of semi-bandit problems. Our algorithm and analysis extend the recent work of (Zimmert & Seldin, 2019) for the special case of multi-armed bandits, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit. Experimental results on synthetic data show that our algorithm indeed performs well uniformly over different environments. We finally provide a preliminary extension of our results to the full bandit feedback.

APA

Zimmert, J., Luo, H. & Wei, C.. (2019). Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7683-7692 Available from https://proceedings.mlr.press/v97/zimmert19a.html.

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

Abstract

Cite this Paper

Related Material