Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Loucas Pillaud Vivien; Julien Reygner; Nicolas Flammarion

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

Loucas Pillaud Vivien, Julien Reygner, Nicolas Flammarion

Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:2127-2159, 2022.

Abstract

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v178-vivien22a,
  title = 	 {Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation},
  author =       {Vivien, Loucas Pillaud and Reygner, Julien and Flammarion, Nicolas},
  booktitle = 	 {Proceedings of Thirty Fifth Conference on Learning Theory},
  pages = 	 {2127--2159},
  year = 	 {2022},
  editor = 	 {Loh, Po-Ling and Raginsky, Maxim},
  volume = 	 {178},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v178/vivien22a/vivien22a.pdf},
  url = 	 {https://proceedings.mlr.press/v178/vivien22a.html},
  abstract = 	 {Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.}
}

Endnote

%0 Conference Paper
%T Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
%A Loucas Pillaud Vivien
%A Julien Reygner
%A Nicolas Flammarion
%B Proceedings of Thirty Fifth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Po-Ling Loh
%E Maxim Raginsky	
%F pmlr-v178-vivien22a
%I PMLR
%P 2127--2159
%U https://proceedings.mlr.press/v178/vivien22a.html
%V 178
%X Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of stochastic dynamics as observed in practice.

APA


Vivien, L.P., Reygner, J. & Flammarion, N.. (2022). Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:2127-2159 Available from https://proceedings.mlr.press/v178/vivien22a.html.

Related Material

Download PDF