Faster DerivativeFree Stochastic Algorithm for Shared Memory Machines
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:18121821, 2018.
Abstract
Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve largescale machine learning problems in big data applications. Zerothorder (derivativefree) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zerothorder algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly nonconvex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the minibatch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zerothorder algorithms.
Related Material


