Optimal learners for multiclass problems

Amit Daniely; Shai Shalev-Shwartz

Optimal learners for multiclass problems

Amit Daniely, Shai Shalev-Shwartz

Proceedings of The 27th Conference on Learning Theory, PMLR 35:287-316, 2014.

Abstract

The fundamental theorem of statistical learning states that for \emphbinary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for \emphmulticlass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be \emphimproper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of “how to learn”? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et el (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005)

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-daniely14b,
  title = 	 {Optimal learners for multiclass problems},
  author = 	 {Daniely, Amit and Shalev-Shwartz, Shai},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {287--316},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/daniely14b.pdf},
  url = 	 {https://proceedings.mlr.press/v35/daniely14b.html},
  abstract = 	 {The fundamental theorem of statistical learning states that for \emphbinary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for \emphmulticlass prediction.  We start by proving a surprising result: a generic optimal multiclass learner must be \emphimproper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of “how to learn”? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et el (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005)}
}

Endnote

%0 Conference Paper
%T Optimal learners for multiclass problems
%A Amit Daniely
%A Shai Shalev-Shwartz
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-daniely14b
%I PMLR
%P 287--316
%U https://proceedings.mlr.press/v35/daniely14b.html
%V 35
%X The fundamental theorem of statistical learning states that for \emphbinary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for \emphmulticlass prediction.  We start by proving a surprising result: a generic optimal multiclass learner must be \emphimproper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of “how to learn”? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et el (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005)

RIS


TY  - CPAPER
TI  - Optimal learners for multiclass problems
AU  - Amit Daniely
AU  - Shai Shalev-Shwartz
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-daniely14b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 287
EP  - 316
L1  - http://proceedings.mlr.press/v35/daniely14b.pdf
UR  - https://proceedings.mlr.press/v35/daniely14b.html
AB  - The fundamental theorem of statistical learning states that for \emphbinary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for \emphmulticlass prediction.  We start by proving a surprising result: a generic optimal multiclass learner must be \emphimproper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundamental question of “how to learn”? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et el (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005)
ER  -

APA


Daniely, A. & Shalev-Shwartz, S.. (2014). Optimal learners for multiclass problems. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:287-316 Available from https://proceedings.mlr.press/v35/daniely14b.html.

Related Material

Download PDF