Learning One Convolutional Layer with Overlapping Patches

Surbhi Goel, Adam Klivans, Raghu Meka
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1783-1791, 2018.

Abstract

We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches under mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional “patch and stride” convolutions. Our algorithm– Convotron– is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just one disjoint patch $P$ (the other patches may be arbitrary) is easy in the following sense: Convotron can efficiently recover the hidden weight vector by updating only in the direction of $P$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-goel18a, title = {Learning One Convolutional Layer with Overlapping Patches}, author = {Goel, Surbhi and Klivans, Adam and Meka, Raghu}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1783--1791}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/goel18a/goel18a.pdf}, url = {https://proceedings.mlr.press/v80/goel18a.html}, abstract = {We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches under mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional “patch and stride” convolutions. Our algorithm– Convotron– is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just one disjoint patch $P$ (the other patches may be arbitrary) is easy in the following sense: Convotron can efficiently recover the hidden weight vector by updating only in the direction of $P$.} }
Endnote
%0 Conference Paper %T Learning One Convolutional Layer with Overlapping Patches %A Surbhi Goel %A Adam Klivans %A Raghu Meka %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-goel18a %I PMLR %P 1783--1791 %U https://proceedings.mlr.press/v80/goel18a.html %V 80 %X We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches under mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional “patch and stride” convolutions. Our algorithm– Convotron– is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just one disjoint patch $P$ (the other patches may be arbitrary) is easy in the following sense: Convotron can efficiently recover the hidden weight vector by updating only in the direction of $P$.
APA
Goel, S., Klivans, A. & Meka, R.. (2018). Learning One Convolutional Layer with Overlapping Patches. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1783-1791 Available from https://proceedings.mlr.press/v80/goel18a.html.

Related Material