Bandit Convex Optimization: \sqrtT Regret in One Dimension

Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres
; Proceedings of The 28th Conference on Learning Theory, PMLR 40:266-278, 2015.

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Bubeck15a, title = {Bandit Convex Optimization: $\sqrt{T}$ Regret in One Dimension}, author = {Sébastien Bubeck and Ofer Dekel and Tomer Koren and Yuval Peres}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {266--278}, year = {2015}, editor = {Peter Grünwald and Elad Hazan and Satyen Kale}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Bubeck15a.pdf}, url = {http://proceedings.mlr.press/v40/Bubeck15a.html}, abstract = {We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. } }
Endnote
%0 Conference Paper %T Bandit Convex Optimization: \sqrtT Regret in One Dimension %A Sébastien Bubeck %A Ofer Dekel %A Tomer Koren %A Yuval Peres %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Bubeck15a %I PMLR %J Proceedings of Machine Learning Research %P 266--278 %U http://proceedings.mlr.press %V 40 %W PMLR %X We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.
RIS
TY - CPAPER TI - Bandit Convex Optimization: \sqrtT Regret in One Dimension AU - Sébastien Bubeck AU - Ofer Dekel AU - Tomer Koren AU - Yuval Peres BT - Proceedings of The 28th Conference on Learning Theory PY - 2015/06/26 DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Bubeck15a PB - PMLR SP - 266 DP - PMLR EP - 278 L1 - http://proceedings.mlr.press/v40/Bubeck15a.pdf UR - http://proceedings.mlr.press/v40/Bubeck15a.html AB - We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. ER -
APA
Bubeck, S., Dekel, O., Koren, T. & Peres, Y.. (2015). Bandit Convex Optimization: \sqrtT Regret in One Dimension. Proceedings of The 28th Conference on Learning Theory, in PMLR 40:266-278

Related Material