[edit]
Online Newton Method for Bandit Convex Optimisation Extended Abstract
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1713-1714, 2024.
Abstract
We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most d3.5√npolylog(n,d) with high probability where d is the dimension and n is the time horizon. In the stochastic setting the bound improves to Md2√npolylog(n,d) where M∈[d−1/2,d−1/4] is a constant that depends on the geometry of the constraint set and the desired computational properties.