Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Tianyi Liu; Yan Li; Song Wei; Enlu Zhou; Tuo Zhao

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1891-1899, 2021.

Abstract

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

Cite this Paper

BibTeX


@InProceedings{pmlr-v130-liu21e,
  title = 	 { Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization },
  author =       {Liu, Tianyi and Li, Yan and Wei, Song and Zhou, Enlu and Zhao, Tuo},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1891--1899},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/liu21e/liu21e.pdf},
  url = 	 {https://proceedings.mlr.press/v130/liu21e.html},
  abstract = 	 { Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory. }
}

Endnote

%0 Conference Paper
%T  Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization 
%A Tianyi Liu
%A Yan Li
%A Song Wei
%A Enlu Zhou
%A Tuo Zhao
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-liu21e
%I PMLR
%P 1891--1899
%U https://proceedings.mlr.press/v130/liu21e.html
%V 130
%X  Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

APA


Liu, T., Li, Y., Wei, S., Zhou, E. & Zhao, T.. (2021).  Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1891-1899 Available from https://proceedings.mlr.press/v130/liu21e.html.

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Abstract

Cite this Paper

Related Material