Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines

Bin Gu, Zhouyuan Huo, Cheng Deng, Heng Huang
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1812-1821, 2018.

Abstract

Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve large-scale machine learning problems in big data applications. Zeroth-order (derivative-free) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zeroth-order algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly non-convex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zeroth-order algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-gu18a, title = {Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines}, author = {Gu, Bin and Huo, Zhouyuan and Deng, Cheng and Huang, Heng}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1812--1821}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/gu18a/gu18a.pdf}, url = {http://proceedings.mlr.press/v80/gu18a.html}, abstract = {Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve large-scale machine learning problems in big data applications. Zeroth-order (derivative-free) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zeroth-order algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly non-convex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zeroth-order algorithms.} }
Endnote
%0 Conference Paper %T Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines %A Bin Gu %A Zhouyuan Huo %A Cheng Deng %A Heng Huang %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-gu18a %I PMLR %P 1812--1821 %U http://proceedings.mlr.press/v80/gu18a.html %V 80 %X Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve large-scale machine learning problems in big data applications. Zeroth-order (derivative-free) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zeroth-order algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly non-convex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zeroth-order algorithms.
APA
Gu, B., Huo, Z., Deng, C. & Huang, H.. (2018). Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1812-1821 Available from http://proceedings.mlr.press/v80/gu18a.html.

Related Material