Data-splitting improves statistical performance in overparameterized regimes

Nicole Muecke; Enrico Reiss; Jonas Rungenhagen; Markus Klein

Data-splitting improves statistical performance in overparameterized regimes

Nicole Muecke, Enrico Reiss, Jonas Rungenhagen, Markus Klein

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10322-10350, 2022.

Abstract

While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparameterization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-muecke22a,
  title = 	 { Data-splitting improves statistical performance in overparameterized regimes },
  author =       {Muecke, Nicole and Reiss, Enrico and Rungenhagen, Jonas and Klein, Markus},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {10322--10350},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/muecke22a/muecke22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/muecke22a.html},
  abstract = 	 { While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparameterization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters. }
}

Endnote

%0 Conference Paper
%T  Data-splitting improves statistical performance in overparameterized regimes 
%A Nicole Muecke
%A Enrico Reiss
%A Jonas Rungenhagen
%A Markus Klein
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-muecke22a
%I PMLR
%P 10322--10350
%U https://proceedings.mlr.press/v151/muecke22a.html
%V 151
%X  While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparameterization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.

APA


Muecke, N., Reiss, E., Rungenhagen, J. & Klein, M.. (2022).  Data-splitting improves statistical performance in overparameterized regimes . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:10322-10350 Available from https://proceedings.mlr.press/v151/muecke22a.html.

Related Material

Download PDF