Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Santhosh Karnik; Anna Veselovska; Mark Iwen; Felix Krahmer

Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Santhosh Karnik, Anna Veselovska, Mark Iwen, Felix Krahmer

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29148-29204, 2025.

Abstract

We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-karnik25a,
  title = 	 {Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent},
  author =       {Karnik, Santhosh and Veselovska, Anna and Iwen, Mark and Krahmer, Felix},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {29148--29204},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/karnik25a/karnik25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/karnik25a.html},
  abstract = 	 {We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.}
}

Endnote

%0 Conference Paper
%T Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent
%A Santhosh Karnik
%A Anna Veselovska
%A Mark Iwen
%A Felix Krahmer
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-karnik25a
%I PMLR
%P 29148--29204
%U https://proceedings.mlr.press/v267/karnik25a.html
%V 267
%X We provide a rigorous analysis of implicit regularization in an overparametrized tensor factorization problem beyond the lazy training regime. For matrix factorization problems, this phenomenon has been studied in a number of works. A particular challenge has been to design universal initialization strategies which provably lead to implicit regularization in gradient-descent methods. At the same time, it has been argued by Cohen et. al. 2016 that more general classes of neural networks can be captured by considering tensor factorizations. However, in the tensor case, implicit regularization has only been rigorously established for gradient flow or in the lazy training regime. In this paper, we prove the first tensor result of its kind for gradient descent rather than gradient flow. We focus on the tubal tensor product and the associated notion of low tubal rank, encouraged by the relevance of this model for image data. We establish that gradient descent in an overparametrized tensor factorization model with a small random initialization exhibits an implicit bias towards solutions of low tubal rank. Our theoretical findings are illustrated in an extensive set of numerical simulations show-casing the dynamics predicted by our theory as well as the crucial role of using a small random initialization.

APA

Karnik, S., Veselovska, A., Iwen, M. & Krahmer, F.. (2025). Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29148-29204 Available from https://proceedings.mlr.press/v267/karnik25a.html.

Implicit Regularization for Tubal Tensor Factorizations via Gradient Descent

Abstract

Cite this Paper

Related Material