Bandit Convex Optimization: \sqrtT Regret in One Dimension

Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres
Proceedings of The 28th Conference on Learning Theory, PMLR 40:266-278, 2015.

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Bubeck15a, title = {Bandit Convex Optimization: $\sqrt{T}$ Regret in One Dimension}, author = {Bubeck, Sébastien and Dekel, Ofer and Koren, Tomer and Peres, Yuval}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {266--278}, year = {2015}, editor = {Grünwald, Peter and Hazan, Elad and Kale, Satyen}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Bubeck15a.pdf}, url = {https://proceedings.mlr.press/v40/Bubeck15a.html}, abstract = {We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. } }
Endnote
%0 Conference Paper %T Bandit Convex Optimization: \sqrtT Regret in One Dimension %A Sébastien Bubeck %A Ofer Dekel %A Tomer Koren %A Yuval Peres %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Bubeck15a %I PMLR %P 266--278 %U https://proceedings.mlr.press/v40/Bubeck15a.html %V 40 %X We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.
RIS
TY - CPAPER TI - Bandit Convex Optimization: \sqrtT Regret in One Dimension AU - Sébastien Bubeck AU - Ofer Dekel AU - Tomer Koren AU - Yuval Peres BT - Proceedings of The 28th Conference on Learning Theory DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Bubeck15a PB - PMLR DP - Proceedings of Machine Learning Research VL - 40 SP - 266 EP - 278 L1 - http://proceedings.mlr.press/v40/Bubeck15a.pdf UR - https://proceedings.mlr.press/v40/Bubeck15a.html AB - We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. ER -
APA
Bubeck, S., Dekel, O., Koren, T. & Peres, Y.. (2015). Bandit Convex Optimization: \sqrtT Regret in One Dimension. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:266-278 Available from https://proceedings.mlr.press/v40/Bubeck15a.html.

Related Material