Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition

Rong Ge; Furong Huang; Chi Jin; Yang Yuan

Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition

Rong Ge, Furong Huang, Chi Jin, Yang Yuan

Proceedings of The 28th Conference on Learning Theory, PMLR 40:797-842, 2015.

Abstract

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in \em saddle points. In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an \em arbitrary starting point, stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives \em global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Ge15,
  title = 	 {Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition},
  author = 	 {Ge, Rong and Huang, Furong and Jin, Chi and Yuan, Yang},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {797--842},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Ge15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Ge15.html},
  abstract = 	 {We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in \em saddle points. In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an \em arbitrary starting point,  stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives \em global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of  latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.}
}

Endnote

%0 Conference Paper
%T Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition
%A Rong Ge
%A Furong Huang
%A Chi Jin
%A Yang Yuan
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Ge15
%I PMLR
%P 797--842
%U https://proceedings.mlr.press/v40/Ge15.html
%V 40
%X We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in \em saddle points. In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an \em arbitrary starting point,  stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives \em global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of  latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.

RIS


TY  - CPAPER
TI  - Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition
AU  - Rong Ge
AU  - Furong Huang
AU  - Chi Jin
AU  - Yang Yuan
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Ge15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 797
EP  - 842
L1  - http://proceedings.mlr.press/v40/Ge15.pdf
UR  - https://proceedings.mlr.press/v40/Ge15.html
AB  - We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in \em saddle points. In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an \em arbitrary starting point,  stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives \em global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of  latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.
ER  -

APA


Ge, R., Huang, F., Jin, C. & Yuan, Y.. (2015). Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:797-842 Available from https://proceedings.mlr.press/v40/Ge15.html.

Related Material

Download PDF