Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko; Marco Mondelli

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8773-8784, 2020.

Abstract

The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and require the number of neurons to scale linearly with the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-shevchenko20a,
  title = 	 {Landscape Connectivity and Dropout Stability of {SGD} Solutions for Over-parameterized Neural Networks},
  author =       {Shevchenko, Alexander and Mondelli, Marco},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {8773--8784},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/shevchenko20a/shevchenko20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/shevchenko20a.html},
  abstract = 	 {The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and require the number of neurons to scale linearly with the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.}
}

Endnote

%0 Conference Paper
%T Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
%A Alexander Shevchenko
%A Marco Mondelli
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-shevchenko20a
%I PMLR
%P 8773--8784
%U https://proceedings.mlr.press/v119/shevchenko20a.html
%V 119
%X The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and require the number of neurons to scale linearly with the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.

APA

Shevchenko, A. & Mondelli, M.. (2020). Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8773-8784 Available from https://proceedings.mlr.press/v119/shevchenko20a.html.

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Abstract

Cite this Paper

Related Material