Optimization Landscape and Expressivity of Deep CNNs

Quynh Nguyen; Matthias Hein

Optimization Landscape and Expressivity of Deep CNNs

Quynh Nguyen, Matthias Hein

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3730-3739, 2018.

Abstract

We analyze the loss landscape and expressiveness of practical deep convolutional neural networks (CNNs) with shared weights and max pooling layers. We show that such CNNs produce linearly independent features at a “wide” layer which has more neurons than the number of training samples. This condition holds e.g. for the VGG network. Furthermore, we provide for such wide CNNs necessary and sufficient conditions for global minima with zero training error. For the case where the wide layer is followed by a fully connected layer we show that almost every critical point of the empirical loss is a global minimum with zero training error. Our analysis suggests that both depth and width are very important in deep learning. While depth brings more representational power and allows the network to learn high level features, width smoothes the optimization landscape of the loss function in the sense that a sufficiently wide network has a well-behaved loss surface with almost no bad local minima.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-nguyen18a,
  title = 	 {Optimization Landscape and Expressivity of Deep {CNN}s},
  author =       {Nguyen, Quynh and Hein, Matthias},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {3730--3739},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/nguyen18a/nguyen18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/nguyen18a.html},
  abstract = 	 {We analyze the loss landscape and expressiveness of practical deep convolutional neural networks (CNNs) with shared weights and max pooling layers. We show that such CNNs produce linearly independent features at a “wide” layer which has more neurons than the number of training samples. This condition holds e.g. for the VGG network. Furthermore, we provide for such wide CNNs necessary and sufficient conditions for global minima with zero training error. For the case where the wide layer is followed by a fully connected layer we show that almost every critical point of the empirical loss is a global minimum with zero training error. Our analysis suggests that both depth and width are very important in deep learning. While depth brings more representational power and allows the network to learn high level features, width smoothes the optimization landscape of the loss function in the sense that a sufficiently wide network has a well-behaved loss surface with almost no bad local minima.}
}

Endnote

%0 Conference Paper
%T Optimization Landscape and Expressivity of Deep CNNs
%A Quynh Nguyen
%A Matthias Hein
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-nguyen18a
%I PMLR
%P 3730--3739
%U https://proceedings.mlr.press/v80/nguyen18a.html
%V 80
%X We analyze the loss landscape and expressiveness of practical deep convolutional neural networks (CNNs) with shared weights and max pooling layers. We show that such CNNs produce linearly independent features at a “wide” layer which has more neurons than the number of training samples. This condition holds e.g. for the VGG network. Furthermore, we provide for such wide CNNs necessary and sufficient conditions for global minima with zero training error. For the case where the wide layer is followed by a fully connected layer we show that almost every critical point of the empirical loss is a global minimum with zero training error. Our analysis suggests that both depth and width are very important in deep learning. While depth brings more representational power and allows the network to learn high level features, width smoothes the optimization landscape of the loss function in the sense that a sufficiently wide network has a well-behaved loss surface with almost no bad local minima.

APA

Nguyen, Q. & Hein, M.. (2018). Optimization Landscape and Expressivity of Deep CNNs. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3730-3739 Available from https://proceedings.mlr.press/v80/nguyen18a.html.

Optimization Landscape and Expressivity of Deep CNNs

Abstract

Cite this Paper

Related Material