Reliably Learning the ReLU in Polynomial Time

Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1004-1042, 2017.

Abstract

We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form xmax(0,wx) with wSn1. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution D on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by D, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in n) with respect to \em any distribution on Sn1 (the unit sphere in n dimensions) and for any error parameter ε= Ω(1 / \log n) (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where ε must be Ω(1) and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-goel17a, title = {Reliably Learning the ReLU in Polynomial Time}, author = {Goel, Surbhi and Kanade, Varun and Klivans, Adam and Thaler, Justin}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1004--1042}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/goel17a/goel17a.pdf}, url = {https://proceedings.mlr.press/v65/goel17a.html}, abstract = {We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0,  \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution $\mathcal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\mathcal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to \em any distribution on $\mathbb{S}^n-1$ (the unit sphere in $n$ dimensions) and for any error parameter $ε= Ω(1 / \log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $ε$ must be $Ω(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere. } }
Endnote
%0 Conference Paper %T Reliably Learning the ReLU in Polynomial Time %A Surbhi Goel %A Varun Kanade %A Adam Klivans %A Justin Thaler %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-goel17a %I PMLR %P 1004--1042 %U https://proceedings.mlr.press/v65/goel17a.html %V 65 %X We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0,  \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$. Our algorithm works in the challenging Reliable Agnostic learning model of Kalai, Kanade and Mansour (2012) where the learner is given access to a distribution $\mathcal{D}$ on labeled examples but the labeling may be arbitrary. We construct a hypothesis that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\mathcal{D}$, for any convex, bounded, and Lipschitz loss function. The algorithm runs in polynomial-time (in $n$) with respect to \em any distribution on $\mathbb{S}^n-1$ (the unit sphere in $n$ dimensions) and for any error parameter $ε= Ω(1 / \log n)$ (this yields a PTAS for a question raised by F. Bach on the complexity of maximizing ReLUs). These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $ε$ must be $Ω(1)$ and strong assumptions are required on the marginal distribution. We can compose our results to obtain the first set of efficient algorithms for learning constant-depth networks of ReLU with fixed polynomial-dependence in the dimension. For depth-2 networks of sigmoids, we obtain the first algorithms that have a polynomial dependency in \em all parameters. Our techniques combine kernel methods and polynomial approximations with a “dual-loss” approach to convex programming. As a byproduct we obtain a number of applications including the first set of efficient algorithms for “convex piecewise-linear fitting” and the first efficient algorithms for noisy polynomial reconstruction of low-weight polynomials on the unit sphere.
APA
Goel, S., Kanade, V., Klivans, A. & Thaler, J.. (2017). Reliably Learning the ReLU in Polynomial Time. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1004-1042 Available from https://proceedings.mlr.press/v65/goel17a.html.

Related Material