An optimization and generalization analysis for max-pooling networks

Alon Brutzkus; Amir Globerson

An optimization and generalization analysis for max-pooling networks

Alon Brutzkus, Amir Globerson

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:1650-1660, 2021.

Abstract

Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a “discriminative” pattern needs to be detected among “spurious” patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v161-brutzkus21a,
  title = 	 {An optimization and generalization analysis for max-pooling networks},
  author =       {Brutzkus, Alon and Globerson, Amir},
  booktitle = 	 {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1650--1660},
  year = 	 {2021},
  editor = 	 {de Campos, Cassio and Maathuis, Marloes H.},
  volume = 	 {161},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v161/brutzkus21a/brutzkus21a.pdf},
  url = 	 {https://proceedings.mlr.press/v161/brutzkus21a.html},
  abstract = 	 {Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a “discriminative” pattern needs to be detected among “spurious” patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.}
}

Endnote

%0 Conference Paper
%T An optimization and generalization analysis for max-pooling networks
%A Alon Brutzkus
%A Amir Globerson
%B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2021
%E Cassio de Campos
%E Marloes H. Maathuis	
%F pmlr-v161-brutzkus21a
%I PMLR
%P 1650--1660
%U https://proceedings.mlr.press/v161/brutzkus21a.html
%V 161
%X Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a “discriminative” pattern needs to be detected among “spurious” patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.

APA


Brutzkus, A. & Globerson, A.. (2021). An optimization and generalization analysis for max-pooling networks. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:1650-1660 Available from https://proceedings.mlr.press/v161/brutzkus21a.html.

An optimization and generalization analysis for max-pooling networks

Abstract

Cite this Paper

Related Material