The sample complexity of agnostic learning under deterministic labels

Shai Ben-David; Ruth Urner

The sample complexity of agnostic learning under deterministic labels

Shai Ben-David, Ruth Urner

Proceedings of The 27th Conference on Learning Theory, PMLR 35:527-542, 2014.

Abstract

With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v35-ben-david14,
  title = 	 {The sample complexity of agnostic learning under deterministic labels},
  author = 	 {Ben-David, Shai and Urner, Ruth},
  booktitle = 	 {Proceedings of The 27th Conference on Learning Theory},
  pages = 	 {527--542},
  year = 	 {2014},
  editor = 	 {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba},
  volume = 	 {35},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {13--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v35/ben-david14.pdf},
  url = 	 {https://proceedings.mlr.press/v35/ben-david14.html},
  abstract = 	 {With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known  hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity.  While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.}
}

Endnote

%0 Conference Paper
%T The sample complexity of agnostic learning under deterministic labels
%A Shai Ben-David
%A Ruth Urner
%B Proceedings of The 27th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2014
%E Maria Florina Balcan
%E Vitaly Feldman
%E Csaba Szepesvári	
%F pmlr-v35-ben-david14
%I PMLR
%P 527--542
%U https://proceedings.mlr.press/v35/ben-david14.html
%V 35
%X With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known  hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity.  While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.

RIS


TY  - CPAPER
TI  - The sample complexity of agnostic learning under deterministic labels
AU  - Shai Ben-David
AU  - Ruth Urner
BT  - Proceedings of The 27th Conference on Learning Theory
DA  - 2014/05/29
ED  - Maria Florina Balcan
ED  - Vitaly Feldman
ED  - Csaba Szepesvári	
ID  - pmlr-v35-ben-david14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 35
SP  - 527
EP  - 542
L1  - http://proceedings.mlr.press/v35/ben-david14.pdf
UR  - https://proceedings.mlr.press/v35/ben-david14.html
AB  - With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known  hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity.  While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.
ER  -

APA


Ben-David, S. & Urner, R.. (2014). The sample complexity of agnostic learning under deterministic labels. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:527-542 Available from https://proceedings.mlr.press/v35/ben-david14.html.

Related Material

Download PDF