Tight Bounds for Bandit Combinatorial Optimization

Alon Cohen, Tamir Hazan, Tomer Koren
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:629-642, 2017.

Abstract

We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetildeΘ(k^3/2\sqrt{d}T)$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetildeΘ(k\sqrt{d}T)$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-cohen17a, title = {Tight Bounds for Bandit Combinatorial Optimization}, author = {Cohen, Alon and Hazan, Tamir and Koren, Tomer}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {629--642}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/cohen17a/cohen17a.pdf}, url = {https://proceedings.mlr.press/v65/cohen17a.html}, abstract = {We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetildeΘ(k^3/2\sqrt{d}T)$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetildeΘ(k\sqrt{d}T)$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).} }
Endnote
%0 Conference Paper %T Tight Bounds for Bandit Combinatorial Optimization %A Alon Cohen %A Tamir Hazan %A Tomer Koren %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-cohen17a %I PMLR %P 629--642 %U https://proceedings.mlr.press/v65/cohen17a.html %V 65 %X We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetildeΘ(k^3/2\sqrt{d}T)$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetildeΘ(k\sqrt{d}T)$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).
APA
Cohen, A., Hazan, T. & Koren, T.. (2017). Tight Bounds for Bandit Combinatorial Optimization. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:629-642 Available from https://proceedings.mlr.press/v65/cohen17a.html.

Related Material