Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Jackie Lok; Rishi Sonthalia; Elizaveta Rebrova

Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:736-770, 2025.

Abstract

We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix

$Z$ between the original features

$X$ and a set of new features

$\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing

$Z$ , a non-commutative polynomial of random matrices, with the sample covariance matrix of

$X$ asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-lok25a,
  title = 	 {Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression},
  author =       {Lok, Jackie and Sonthalia, Rishi and Rebrova, Elizaveta},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {736--770},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/lok25a/lok25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/lok25a.html},
  abstract = 	 {We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing $Z$, a non-commutative polynomial of random matrices, with the sample covariance matrix of $X$ asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum.}
}

Endnote

%0 Conference Paper
%T Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression
%A Jackie Lok
%A Rishi Sonthalia
%A Elizaveta Rebrova
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-lok25a
%I PMLR
%P 736--770
%U https://proceedings.mlr.press/v272/lok25a.html
%V 272
%X We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$ in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing $Z$, a non-commutative polynomial of random matrices, with the sample covariance matrix of $X$ asymptotically, we demonstrate that batching affects the dynamics by resulting in a form of shrinkage on the spectrum.

APA

Lok, J., Sonthalia, R. & Rebrova, E.. (2025). Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:736-770 Available from https://proceedings.mlr.press/v272/lok25a.html.

Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression

Abstract

Cite this Paper

Related Material