Open Problem: Data Selection for Regression Tasks

Steve Hanneke, Shay Moran, Alexander Shlimovich, Amir Yehudayoff
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:6225-6229, 2025.

Abstract

This note proposes a set of open problems concerning data selection in regression tasks. The central question is: given a natural learning rule $\mathcal{A}$ and a selection budget $n$, how well can $\mathcal{A}$ perform when trained on $n$ examples selected from a larger dataset? We present concrete instances of this question in basic regression settings, including mean estimation and linear regression.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-hanneke25e, title = {Open Problem: Data Selection for Regression Tasks}, author = {Hanneke, Steve and Moran, Shay and Shlimovich, Alexander and Yehudayoff, Amir}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {6225--6229}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/hanneke25e/hanneke25e.pdf}, url = {https://proceedings.mlr.press/v291/hanneke25e.html}, abstract = {This note proposes a set of open problems concerning data selection in regression tasks. The central question is: given a natural learning rule $\mathcal{A}$ and a selection budget $n$, how well can $\mathcal{A}$ perform when trained on $n$ examples selected from a larger dataset? We present concrete instances of this question in basic regression settings, including mean estimation and linear regression.} }
Endnote
%0 Conference Paper %T Open Problem: Data Selection for Regression Tasks %A Steve Hanneke %A Shay Moran %A Alexander Shlimovich %A Amir Yehudayoff %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-hanneke25e %I PMLR %P 6225--6229 %U https://proceedings.mlr.press/v291/hanneke25e.html %V 291 %X This note proposes a set of open problems concerning data selection in regression tasks. The central question is: given a natural learning rule $\mathcal{A}$ and a selection budget $n$, how well can $\mathcal{A}$ perform when trained on $n$ examples selected from a larger dataset? We present concrete instances of this question in basic regression settings, including mean estimation and linear regression.
APA
Hanneke, S., Moran, S., Shlimovich, A. & Yehudayoff, A.. (2025). Open Problem: Data Selection for Regression Tasks. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:6225-6229 Available from https://proceedings.mlr.press/v291/hanneke25e.html.

Related Material