Learning Polynomials with Neural Networks

Alexandr Andoni; Rina Panigrahy; Gregory Valiant; Li Zhang

Learning Polynomials with Neural Networks

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1908-1916, 2014.

Abstract

We study the effectiveness of learning low degree polynomials using neural networks by the gradient descent method. While neural networks have been shown to have great expressive power, and gradient descent has been widely used in practice for learning neural networks, few theoretical guarantees are known for such methods. In particular, it is well known that gradient descent can get stuck at local minima, even for simple classes of target functions. In this paper, we present several positive theoretical results to support the effectiveness of neural networks. We focus on two-layer neural networks (i.e. one hidden layer) where the top layer node is a linear function, similar to \citebarron93. First we show that for a randomly initialized neural network with sufficiently many hidden units, the gradient descent method can learn any low degree polynomial. Secondly, we show that if we use complex-valued weights (the target function can still be real), then under suitable conditions, there are no “robust local minima”: the neural network can always escape a local minimum by performing a random perturbation. This property does not hold for real-valued weights. Thirdly, we discuss whether sparse polynomials can be learned with \emphsmall neural networks, where the size is dependent on the sparsity of the target function.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-andoni14,
  title = 	 {Learning Polynomials with Neural Networks},
  author = 	 {Andoni, Alexandr and Panigrahy, Rina and Valiant, Gregory and Zhang, Li},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1908--1916},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/andoni14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/andoni14.html},
  abstract = 	 {We study the effectiveness of learning low degree polynomials using   neural networks by the gradient descent method.  While neural   networks have been shown to have great expressive power, and gradient   descent has been widely used in practice for learning neural   networks, few theoretical guarantees are known for such methods.  In   particular, it is well known that gradient descent can get stuck at   local minima, even for simple classes of target functions.  In this   paper, we present several positive theoretical results to support the   effectiveness of neural networks.  We focus on two-layer neural   networks (i.e. one hidden layer) where the top layer node is a linear   function, similar to \citebarron93.  First we show that for a   randomly initialized neural network with sufficiently many hidden   units, the gradient descent method can learn any low degree   polynomial.  Secondly, we show that if we use complex-valued weights   (the target function can still be real), then under suitable   conditions, there are no “robust local minima”: the neural network   can always escape a local minimum by performing a random   perturbation. This property does not hold for real-valued weights.   Thirdly, we discuss whether sparse polynomials can be learned   with \emphsmall neural networks, where the size is dependent on the   sparsity of the target function.}
}

Endnote

%0 Conference Paper
%T Learning Polynomials with Neural Networks
%A Alexandr Andoni
%A Rina Panigrahy
%A Gregory Valiant
%A Li Zhang
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-andoni14
%I PMLR
%P 1908--1916
%U https://proceedings.mlr.press/v32/andoni14.html
%V 32
%N 2
%X We study the effectiveness of learning low degree polynomials using   neural networks by the gradient descent method.  While neural   networks have been shown to have great expressive power, and gradient   descent has been widely used in practice for learning neural   networks, few theoretical guarantees are known for such methods.  In   particular, it is well known that gradient descent can get stuck at   local minima, even for simple classes of target functions.  In this   paper, we present several positive theoretical results to support the   effectiveness of neural networks.  We focus on two-layer neural   networks (i.e. one hidden layer) where the top layer node is a linear   function, similar to \citebarron93.  First we show that for a   randomly initialized neural network with sufficiently many hidden   units, the gradient descent method can learn any low degree   polynomial.  Secondly, we show that if we use complex-valued weights   (the target function can still be real), then under suitable   conditions, there are no “robust local minima”: the neural network   can always escape a local minimum by performing a random   perturbation. This property does not hold for real-valued weights.   Thirdly, we discuss whether sparse polynomials can be learned   with \emphsmall neural networks, where the size is dependent on the   sparsity of the target function.

RIS


TY  - CPAPER
TI  - Learning Polynomials with Neural Networks
AU  - Alexandr Andoni
AU  - Rina Panigrahy
AU  - Gregory Valiant
AU  - Li Zhang
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-andoni14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1908
EP  - 1916
L1  - http://proceedings.mlr.press/v32/andoni14.pdf
UR  - https://proceedings.mlr.press/v32/andoni14.html
AB  - We study the effectiveness of learning low degree polynomials using   neural networks by the gradient descent method.  While neural   networks have been shown to have great expressive power, and gradient   descent has been widely used in practice for learning neural   networks, few theoretical guarantees are known for such methods.  In   particular, it is well known that gradient descent can get stuck at   local minima, even for simple classes of target functions.  In this   paper, we present several positive theoretical results to support the   effectiveness of neural networks.  We focus on two-layer neural   networks (i.e. one hidden layer) where the top layer node is a linear   function, similar to \citebarron93.  First we show that for a   randomly initialized neural network with sufficiently many hidden   units, the gradient descent method can learn any low degree   polynomial.  Secondly, we show that if we use complex-valued weights   (the target function can still be real), then under suitable   conditions, there are no “robust local minima”: the neural network   can always escape a local minimum by performing a random   perturbation. This property does not hold for real-valued weights.   Thirdly, we discuss whether sparse polynomials can be learned   with \emphsmall neural networks, where the size is dependent on the   sparsity of the target function.
ER  -

APA


Andoni, A., Panigrahy, R., Valiant, G. & Zhang, L.. (2014). Learning Polynomials with Neural Networks. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1908-1916 Available from https://proceedings.mlr.press/v32/andoni14.html.

Related Material

Download PDF