PAGD: On the Convergence of Perturbed Alternating Gradient Descent to SecondOrder Stationary Points for Structured Nonconvex Optimization
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:41344143, 2019.
Abstract
Alternating gradient descent (AGD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed AGD (PAGD) which is able to converge (with high probability) to the secondorder stationary points (SOSPs) with a global sublinear rate. Existing analysis on AGD type algorithm either only guarantees convergence to firstorder solutions, or converges to secondorder solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.
Related Material


