[edit]
Improved Regret for Zeroth-Order Stochastic Convex Bandits
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2938-2964, 2021.
Abstract
We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.