PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu, Mingyi Hong, Zhengdao Wang
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4134-4143, 2019.

Abstract

Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-lu19a, title = {{PA}-{GD}: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization}, author = {Lu, Songtao and Hong, Mingyi and Wang, Zhengdao}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4134--4143}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/lu19a/lu19a.pdf}, url = {https://proceedings.mlr.press/v97/lu19a.html}, abstract = {Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.} }
Endnote
%0 Conference Paper %T PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization %A Songtao Lu %A Mingyi Hong %A Zhengdao Wang %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-lu19a %I PMLR %P 4134--4143 %U https://proceedings.mlr.press/v97/lu19a.html %V 97 %X Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.
APA
Lu, S., Hong, M. & Wang, Z.. (2019). PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4134-4143 Available from https://proceedings.mlr.press/v97/lu19a.html.

Related Material