Optimal Sample Complexity Bounds for Non-convex Optimization under Kurdyka-Lojasiewicz Condition
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6806-6821, 2023.
Optimization of smooth reward functions under bandit feedback is a long-standing problem in online learning. This paper approaches this problem by studying the convergence under smoothness and Kurdyka-Lojasiewicz conditions. We designed a search-based algorithm that achieves an improved rate compared to the standard gradient-based method. In conjunction with a matching lower bound, this algorithm shows optimality in the dependence on precision for the low-dimension regime.