Bandit Convex Optimization: \sqrtT Regret in One Dimension

Sébastien Bubeck; Ofer Dekel; Tomer Koren; Yuval Peres

Bandit Convex Optimization: \sqrtT Regret in One Dimension

Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

Proceedings of The 28th Conference on Learning Theory, PMLR 40:266-278, 2015.

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Bubeck15a,
  title = 	 {Bandit Convex Optimization: $\sqrt{T}$ Regret in One Dimension},
  author = 	 {Bubeck, Sébastien and Dekel, Ofer and Koren, Tomer and Peres, Yuval},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {266--278},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Bubeck15a.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Bubeck15a.html},
  abstract = 	 {We analyze the minimax regret of the adversarial bandit convex optimization problem.  Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. }
}

Endnote

%0 Conference Paper
%T Bandit Convex Optimization: \sqrtT Regret in One Dimension
%A Sébastien Bubeck
%A Ofer Dekel
%A Tomer Koren
%A Yuval Peres
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Bubeck15a
%I PMLR
%P 266--278
%U https://proceedings.mlr.press/v40/Bubeck15a.html
%V 40
%X We analyze the minimax regret of the adversarial bandit convex optimization problem.  Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

RIS


TY  - CPAPER
TI  - Bandit Convex Optimization: \sqrtT Regret in One Dimension
AU  - Sébastien Bubeck
AU  - Ofer Dekel
AU  - Tomer Koren
AU  - Yuval Peres
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Bubeck15a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 266
EP  - 278
L1  - http://proceedings.mlr.press/v40/Bubeck15a.pdf
UR  - https://proceedings.mlr.press/v40/Bubeck15a.html
AB  - We analyze the minimax regret of the adversarial bandit convex optimization problem.  Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. 
ER  -

APA


Bubeck, S., Dekel, O., Koren, T. & Peres, Y.. (2015). Bandit Convex Optimization: \sqrtT Regret in One Dimension. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:266-278 Available from https://proceedings.mlr.press/v40/Bubeck15a.html.

Related Material

Download PDF