PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu; Mingyi Hong; Zhengdao Wang

PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Songtao Lu, Mingyi Hong, Zhengdao Wang

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4134-4143, 2019.

Abstract

Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes

$\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an (

$\epsilon,\sqrt{\epsilon}$ )-SOSP with high probability, where polylog

$(d)$ denotes the polynomial of the logarithm with respect to problem dimension

$d$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-lu19a,
  title = 	 {{PA}-{GD}: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization},
  author =       {Lu, Songtao and Hong, Mingyi and Wang, Zhengdao},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {4134--4143},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/lu19a/lu19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/lu19a.html},
  abstract = 	 {Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.}
}

Endnote

%0 Conference Paper
%T PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization
%A Songtao Lu
%A Mingyi Hong
%A Zhengdao Wang
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-lu19a
%I PMLR
%P 4134--4143
%U https://proceedings.mlr.press/v97/lu19a.html
%V 97
%X Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.

APA


Lu, S., Hong, M. & Wang, Z.. (2019). PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4134-4143 Available from https://proceedings.mlr.press/v97/lu19a.html.

PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Abstract

Cite this Paper

Related Material