On the Learnability of Fully-Connected Neural Networks

Yuchen Zhang, Jason Lee, Martin Wainwright, Michael I. Jordan
; Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:83-91, 2017.

Abstract

Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$. Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-zhang17a, title = {{On the learnability of fully-connected neural networks}}, author = {Yuchen Zhang and Jason Lee and Martin Wainwright and Michael I. Jordan}, pages = {83--91}, year = {2017}, editor = {Aarti Singh and Jerry Zhu}, volume = {54}, series = {Proceedings of Machine Learning Research}, address = {Fort Lauderdale, FL, USA}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/zhang17a/zhang17a.pdf}, url = {http://proceedings.mlr.press/v54/zhang17a.html}, abstract = {Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$. Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.} }
Endnote
%0 Conference Paper %T On the Learnability of Fully-Connected Neural Networks %A Yuchen Zhang %A Jason Lee %A Martin Wainwright %A Michael I. Jordan %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-zhang17a %I PMLR %J Proceedings of Machine Learning Research %P 83--91 %U http://proceedings.mlr.press %V 54 %W PMLR %X Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$. Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.
APA
Zhang, Y., Lee, J., Wainwright, M. & Jordan, M.I.. (2017). On the Learnability of Fully-Connected Neural Networks. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in PMLR 54:83-91

Related Material