Reliably Learning the ReLU in Polynomial Time

Surbhi Goel; Varun Kanade; Adam Klivans; Justin Thaler

Reliably Learning the ReLU in Polynomial Time

Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1004-1042, 2017.

Abstract

We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0, \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution $\mathcal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\mathcal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to \em any distribution on $\mathbb{S}^n-1$ (the unit sphere in $n$ dimensions) and for any error parameter $ε= Ω(1 / \log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $ε$ must be $Ω(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.

Cite this Paper

BibTeX


@InProceedings{pmlr-v65-goel17a,
  title = 	 {Reliably Learning the ReLU in Polynomial Time},
  author = 	 {Goel, Surbhi and Kanade, Varun and Klivans, Adam and Thaler, Justin},
  booktitle = 	 {Proceedings of the 2017 Conference on Learning Theory},
  pages = 	 {1004--1042},
  year = 	 {2017},
  editor = 	 {Kale, Satyen and Shamir, Ohad},
  volume = 	 {65},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v65/goel17a/goel17a.pdf},
  url = 	 {https://proceedings.mlr.press/v65/goel17a.html},
  abstract = 	 {We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0,  \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution $\mathcal{D}$ on labeled examples but the labeling may be arbitrary.  We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\mathcal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to \em any distribution on $\mathbb{S}^n-1$ (the unit sphere in $n$ dimensions) and for any error parameter $ε= Ω(1 / \log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs).  These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $ε$ must be $Ω(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere. }
}

Endnote

%0 Conference Paper
%T Reliably Learning the ReLU in Polynomial Time
%A Surbhi Goel
%A Varun Kanade
%A Adam Klivans
%A Justin Thaler
%B Proceedings of the 2017 Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2017
%E Satyen Kale
%E Ohad Shamir	
%F pmlr-v65-goel17a
%I PMLR
%P 1004--1042
%U https://proceedings.mlr.press/v65/goel17a.html
%V 65
%X We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0,  \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution $\mathcal{D}$ on labeled examples but the labeling may be arbitrary.  We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\mathcal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to \em any distribution on $\mathbb{S}^n-1$ (the unit sphere in $n$ dimensions) and for any error parameter $ε= Ω(1 / \log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs).  These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $ε$ must be $Ω(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.

APA


Goel, S., Kanade, V., Klivans, A. & Thaler, J.. (2017). Reliably Learning the ReLU in Polynomial Time. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1004-1042 Available from https://proceedings.mlr.press/v65/goel17a.html.

Related Material

Download PDF