On the Quality of the Initial Basin in Overspecified Neural Networks

Itay Safran; Ohad Shamir

On the Quality of the Initial Basin in Overspecified Neural Networks

Itay Safran, Ohad Shamir

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:774-782, 2016.

Abstract

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-safran16,
  title = 	 {On the Quality of the Initial Basin in Overspecified Neural Networks},
  author = 	 {Safran, Itay and Shamir, Ohad},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {774--782},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/safran16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/safran16.html},
  abstract = 	 {Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.}
}

Endnote

%0 Conference Paper
%T On the Quality of the Initial Basin in Overspecified Neural Networks
%A Itay Safran
%A Ohad Shamir
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-safran16
%I PMLR
%P 774--782
%U https://proceedings.mlr.press/v48/safran16.html
%V 48
%X Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.

RIS


TY  - CPAPER
TI  - On the Quality of the Initial Basin in Overspecified Neural Networks
AU  - Itay Safran
AU  - Ohad Shamir
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-safran16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 774
EP  - 782
L1  - http://proceedings.mlr.press/v48/safran16.pdf
UR  - https://proceedings.mlr.press/v48/safran16.html
AB  - Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.
ER  -

APA


Safran, I. & Shamir, O.. (2016). On the Quality of the Initial Basin in Overspecified Neural Networks. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:774-782 Available from https://proceedings.mlr.press/v48/safran16.html.

On the Quality of the Initial Basin in Overspecified Neural Networks

Abstract

Cite this Paper

Related Material