Learning Neural Networks with Two Nonlinear Layers in Polynomial Time

Surbhi Goel, Adam R. Klivans
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1470-1499, 2019.

Abstract

We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU). The algorithm succeeds with respect to {\em any} distribution on the unit ball in $n$ dimensions (hidden weight vectors in the first layer have unit norm). This is the first efficient algorithm for learning a general class of neural networks with more than one nonlinear layer that makes no restrictions on the VC-dimension of the network. Algorithms for learning relaxations of our model (e.g., allowing larger weight vectors in the first layer) would lead to breakthroughs on notoriously hard problems in Boolean function learning. Thus, our results are “best possible” with respect to current techniques. Our algorithm– {\em Alphatron}– is an iterative update rule that combines isotonic regression with kernel methods. We use this algorithm to give a simple reduction for translating PAC learning algorithms to the more general, real-valued setting of {\em probabilistic concepts}, a model that (unlike PAC learning) requires non-i.i.d. noise-tolerance. This substantially improves many longstanding results for PAC learning Boolean functions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-goel19b, title = {Learning Neural Networks with Two Nonlinear Layers in Polynomial Time}, author = {Goel, Surbhi and Klivans, Adam R.}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {1470--1499}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/goel19b/goel19b.pdf}, url = {https://proceedings.mlr.press/v99/goel19b.html}, abstract = { We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU). The algorithm succeeds with respect to {\em any} distribution on the unit ball in $n$ dimensions (hidden weight vectors in the first layer have unit norm). This is the first efficient algorithm for learning a general class of neural networks with more than one nonlinear layer that makes no restrictions on the VC-dimension of the network. Algorithms for learning relaxations of our model (e.g., allowing larger weight vectors in the first layer) would lead to breakthroughs on notoriously hard problems in Boolean function learning. Thus, our results are “best possible” with respect to current techniques. Our algorithm– {\em Alphatron}– is an iterative update rule that combines isotonic regression with kernel methods. We use this algorithm to give a simple reduction for translating PAC learning algorithms to the more general, real-valued setting of {\em probabilistic concepts}, a model that (unlike PAC learning) requires non-i.i.d. noise-tolerance. This substantially improves many longstanding results for PAC learning Boolean functions. } }
Endnote
%0 Conference Paper %T Learning Neural Networks with Two Nonlinear Layers in Polynomial Time %A Surbhi Goel %A Adam R. Klivans %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-goel19b %I PMLR %P 1470--1499 %U https://proceedings.mlr.press/v99/goel19b.html %V 99 %X We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU). The algorithm succeeds with respect to {\em any} distribution on the unit ball in $n$ dimensions (hidden weight vectors in the first layer have unit norm). This is the first efficient algorithm for learning a general class of neural networks with more than one nonlinear layer that makes no restrictions on the VC-dimension of the network. Algorithms for learning relaxations of our model (e.g., allowing larger weight vectors in the first layer) would lead to breakthroughs on notoriously hard problems in Boolean function learning. Thus, our results are “best possible” with respect to current techniques. Our algorithm– {\em Alphatron}– is an iterative update rule that combines isotonic regression with kernel methods. We use this algorithm to give a simple reduction for translating PAC learning algorithms to the more general, real-valued setting of {\em probabilistic concepts}, a model that (unlike PAC learning) requires non-i.i.d. noise-tolerance. This substantially improves many longstanding results for PAC learning Boolean functions.
APA
Goel, S. & Klivans, A.R.. (2019). Learning Neural Networks with Two Nonlinear Layers in Polynomial Time. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:1470-1499 Available from https://proceedings.mlr.press/v99/goel19b.html.

Related Material