The sample complexity of agnostic learning under deterministic labels

Shai Ben-David, Ruth Urner
; Proceedings of The 27th Conference on Learning Theory, PMLR 35:527-542, 2014.

Abstract

With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v35-ben-david14, title = {The sample complexity of agnostic learning under deterministic labels}, author = {Shai Ben-David and Ruth Urner}, booktitle = {Proceedings of The 27th Conference on Learning Theory}, pages = {527--542}, year = {2014}, editor = {Maria Florina Balcan and Vitaly Feldman and Csaba Szepesvári}, volume = {35}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {13--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v35/ben-david14.pdf}, url = {http://proceedings.mlr.press/v35/ben-david14.html}, abstract = {With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.} }
Endnote
%0 Conference Paper %T The sample complexity of agnostic learning under deterministic labels %A Shai Ben-David %A Ruth Urner %B Proceedings of The 27th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2014 %E Maria Florina Balcan %E Vitaly Feldman %E Csaba Szepesvári %F pmlr-v35-ben-david14 %I PMLR %J Proceedings of Machine Learning Research %P 527--542 %U http://proceedings.mlr.press %V 35 %W PMLR %X With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.
RIS
TY - CPAPER TI - The sample complexity of agnostic learning under deterministic labels AU - Shai Ben-David AU - Ruth Urner BT - Proceedings of The 27th Conference on Learning Theory PY - 2014/05/29 DA - 2014/05/29 ED - Maria Florina Balcan ED - Vitaly Feldman ED - Csaba Szepesvári ID - pmlr-v35-ben-david14 PB - PMLR SP - 527 DP - PMLR EP - 542 L1 - http://proceedings.mlr.press/v35/ben-david14.pdf UR - http://proceedings.mlr.press/v35/ben-david14.html AB - With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity \tilde O(d/ε), any \emphproper (and therefore, any ERM) algorithm requires Ω(d/ε^2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions. ER -
APA
Ben-David, S. & Urner, R.. (2014). The sample complexity of agnostic learning under deterministic labels. Proceedings of The 27th Conference on Learning Theory, in PMLR 35:527-542

Related Material