The Complexity of Learning Halfspaces using Generalized Linear Methods

Amit Daniely; Nati Linial; Shai Shalev-Shwartz

The Complexity of Learning Halfspaces using Generalized Linear Methods

Amit Daniely, Nati Linial, Shai Shalev-Shwartz

Proceedings of The 27th Conference on Learning Theory, PMLR 35:244-286, 2014.

Abstract

Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF’s. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance \leγfrom it. The γ-margin error of the best h is denoted \mathrmErr_γ(D). An α(γ)-approximation algorithm receives γ,εas input and, using i.i.d. samples of D, outputs a classifier with error rate \le α(γ)\mathrmErr_γ(D) + ε. Such an algorithm is efficient if it uses \mathrmpoly(\frac1γ,\frac1ε) samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is O\left(\frac1/γ\sqrt\log(1/γ)\right) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be \ge Ω\left(\frac1/γ\mathrmpoly\left(\log\left(1/γ\right)\right)\right), essentially matching the best known upper bound.

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-daniely14a,
  title = 	 {The Complexity of Learning Halfspaces using Generalized Linear Methods},
  author = 	 {Daniely, Amit and Linial, Nati and Shalev-Shwartz, Shai},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {244--286},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/daniely14a.pdf},
  url = 	 {https://proceedings.mlr.press/v35/daniely14a.html},
  abstract = 	 {Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF’s. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance \leγfrom it. The γ-margin error of the best h is denoted \mathrmErr_γ(D).  An α(γ)-approximation algorithm receives γ,εas input and, using i.i.d. samples of D, outputs a classifier with error rate \le α(γ)\mathrmErr_γ(D) + ε.  Such an algorithm is efficient if it uses \mathrmpoly(\frac1γ,\frac1ε) samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is O\left(\frac1/γ\sqrt\log(1/γ)\right) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be \ge Ω\left(\frac1/γ\mathrmpoly\left(\log\left(1/γ\right)\right)\right), essentially matching the best known upper bound.}
}

Endnote

%0 Conference Paper
%T The Complexity of Learning Halfspaces using Generalized Linear Methods
%A Amit Daniely
%A Nati Linial
%A Shai Shalev-Shwartz
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-daniely14a
%I PMLR
%P 244--286
%U https://proceedings.mlr.press/v35/daniely14a.html
%V 35
%X Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF’s. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance \leγfrom it. The γ-margin error of the best h is denoted \mathrmErr_γ(D).  An α(γ)-approximation algorithm receives γ,εas input and, using i.i.d. samples of D, outputs a classifier with error rate \le α(γ)\mathrmErr_γ(D) + ε.  Such an algorithm is efficient if it uses \mathrmpoly(\frac1γ,\frac1ε) samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is O\left(\frac1/γ\sqrt\log(1/γ)\right) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be \ge Ω\left(\frac1/γ\mathrmpoly\left(\log\left(1/γ\right)\right)\right), essentially matching the best known upper bound.

RIS


TY  - CPAPER
TI  - The Complexity of Learning Halfspaces using Generalized Linear Methods
AU  - Amit Daniely
AU  - Nati Linial
AU  - Shai Shalev-Shwartz
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-daniely14a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 244
EP  - 286
L1  - http://proceedings.mlr.press/v35/daniely14a.pdf
UR  - https://proceedings.mlr.press/v35/daniely14a.html
AB  - Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a set of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF’s. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. Let D be a distribution over labeled examples. The γ-margin error of a hyperplane h is the probability of an example to fall on the wrong side of h or at a distance \leγfrom it. The γ-margin error of the best h is denoted \mathrmErr_γ(D).  An α(γ)-approximation algorithm receives γ,εas input and, using i.i.d. samples of D, outputs a classifier with error rate \le α(γ)\mathrmErr_γ(D) + ε.  Such an algorithm is efficient if it uses \mathrmpoly(\frac1γ,\frac1ε) samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is O\left(\frac1/γ\sqrt\log(1/γ)\right) and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be \ge Ω\left(\frac1/γ\mathrmpoly\left(\log\left(1/γ\right)\right)\right), essentially matching the best known upper bound.
ER  -

APA


Daniely, A., Linial, N. & Shalev-Shwartz, S.. (2014). The Complexity of Learning Halfspaces using Generalized Linear Methods. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:244-286 Available from https://proceedings.mlr.press/v35/daniely14a.html.

Related Material

Download PDF