Proper Learning, Helly Number, and an Optimal SVM Bound

Olivier Bousquet; Steve Hanneke; Shay Moran; Nikita Zhivotovskiy

Proper Learning, Helly Number, and an Optimal SVM Bound

Olivier Bousquet, Steve Hanneke, Shay Moran, Nikita Zhivotovskiy

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:582-609, 2020.

Abstract

The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/\epsilon)$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C does not include this log factor and is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal dependence on $\epsilon$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $\Theta((n/\epsilon)+(1/\epsilon)\log(1/\delta))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-bousquet20a,
  title = 	 {Proper Learning, Helly Number, and an Optimal SVM Bound},
  author =       {Bousquet, Olivier and Hanneke, Steve and Moran, Shay and Zhivotovskiy, Nikita},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {582--609},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/bousquet20a/bousquet20a.pdf},
  url = 	 {https://proceedings.mlr.press/v125/bousquet20a.html},
  abstract = 	 { The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/\epsilon)$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C does not include this log factor and is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal dependence on $\epsilon$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $\Theta((n/\epsilon)+(1/\epsilon)\log(1/\delta))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.}
}

Endnote

%0 Conference Paper
%T Proper Learning, Helly Number, and an Optimal SVM Bound
%A Olivier Bousquet
%A Steve Hanneke
%A Shay Moran
%A Nikita Zhivotovskiy
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-bousquet20a
%I PMLR
%P 582--609
%U https://proceedings.mlr.press/v125/bousquet20a.html
%V 125
%X  The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/\epsilon)$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C does not include this log factor and is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal dependence on $\epsilon$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $\Theta((n/\epsilon)+(1/\epsilon)\log(1/\delta))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.

APA


Bousquet, O., Hanneke, S., Moran, S. & Zhivotovskiy, N.. (2020). Proper Learning, Helly Number, and an Optimal SVM Bound. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:582-609 Available from https://proceedings.mlr.press/v125/bousquet20a.html.

Related Material

Download PDF