Bandit Convex Optimization: \sqrtT Regret in One Dimension

[edit]

Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres ;
Proceedings of The 28th Conference on Learning Theory, PMLR 40:266-278, 2015.

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

Related Material