On the Learnability of Fully-Connected Neural Networks

Yuchen Zhang; Jason Lee; Martin Wainwright; Michael I. Jordan

On the Learnability of Fully-Connected Neural Networks

Yuchen Zhang, Jason Lee, Martin Wainwright, Michael I. Jordan

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:83-91, 2017.

Abstract

Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on

$\ell_1$ -regularized networks, where the

$\ell_1$ -norm of the incoming weights of every neuron is assumed to be bounded by a constant

$B > 0$ . Our first result shows that such networks are properly learnable in

$\text{poly}(n,d,\exp(1/ε^2))$ time, where

$n$ and

$d$ are the sample size and the input dimension, and

$ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the

$\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on

$1/ε$ is unavoidable unless

$\bf RP = \bf NP$ . Our second result shows that the exponential dependence on

$1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin

$γ> 0$ by an unknown neural network, then the network can be learned in

$\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on

$1/γ$ is unimprovable under a certain cryptographic assumption.

Cite this Paper

BibTeX


@InProceedings{pmlr-v54-zhang17a,
  title = 	 {{On the learnability of fully-connected neural networks}},
  author = 	 {Zhang, Yuchen and Lee, Jason and Wainwright, Martin and Jordan, Michael I.},
  booktitle = 	 {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {83--91},
  year = 	 {2017},
  editor = 	 {Singh, Aarti and Zhu, Jerry},
  volume = 	 {54},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20--22 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v54/zhang17a/zhang17a.pdf},
  url = 	 {https://proceedings.mlr.press/v54/zhang17a.html},
  abstract = 	 {Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$.  Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.}
}

Endnote

%0 Conference Paper
%T On the Learnability of Fully-Connected Neural Networks
%A Yuchen Zhang
%A Jason Lee
%A Martin Wainwright
%A Michael I. Jordan
%B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2017
%E Aarti Singh
%E Jerry Zhu	
%F pmlr-v54-zhang17a
%I PMLR
%P 83--91
%U https://proceedings.mlr.press/v54/zhang17a.html
%V 54
%X Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$.  Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.

APA


Zhang, Y., Lee, J., Wainwright, M. & Jordan, M.I.. (2017). On the Learnability of Fully-Connected Neural Networks. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:83-91 Available from https://proceedings.mlr.press/v54/zhang17a.html.

On the Learnability of Fully-Connected Neural Networks

Abstract

Cite this Paper

Related Material