Agnostic Sample Compression Schemes for Regression

Idan Attias; Steve Hanneke; Aryeh Kontorovich; Menachem Sadigurschi

Agnostic Sample Compression Schemes for Regression

Idan Attias, Steve Hanneke, Aryeh Kontorovich, Menachem Sadigurschi

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:2069-2085, 2024.

Abstract

We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff (2016) for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admit an exact compression scheme of polynomial size in the pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth’s classic sample compression conjecture for realizable-case classification (Warmuth, 2003).

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-attias24b,
  title = 	 {Agnostic Sample Compression Schemes for Regression},
  author =       {Attias, Idan and Hanneke, Steve and Kontorovich, Aryeh and Sadigurschi, Menachem},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {2069--2085},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/attias24b/attias24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/attias24b.html},
  abstract = 	 {We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff (2016) for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admit an exact compression scheme of polynomial size in the pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth’s classic sample compression conjecture for realizable-case classification (Warmuth, 2003).}
}

Endnote

%0 Conference Paper
%T Agnostic Sample Compression Schemes for Regression
%A Idan Attias
%A Steve Hanneke
%A Aryeh Kontorovich
%A Menachem Sadigurschi
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-attias24b
%I PMLR
%P 2069--2085
%U https://proceedings.mlr.press/v235/attias24b.html
%V 235
%X We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff (2016) for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admit an exact compression scheme of polynomial size in the pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth’s classic sample compression conjecture for realizable-case classification (Warmuth, 2003).

APA

Attias, I., Hanneke, S., Kontorovich, A. & Sadigurschi, M.. (2024). Agnostic Sample Compression Schemes for Regression. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:2069-2085 Available from https://proceedings.mlr.press/v235/attias24b.html.

Agnostic Sample Compression Schemes for Regression

Abstract

Cite this Paper

Related Material