On Convergence of Incremental Gradient for Non-convex Smooth Functions

Anastasia Koloskova; Nikita Doikov; Sebastian U Stich; Martin Jaggi

On Convergence of Incremental Gradient for Non-convex Smooth Functions

Anastasia Koloskova, Nikita Doikov, Sebastian U Stich, Martin Jaggi

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:25058-25086, 2024.

Abstract

In machine learning and neural network optimization, algorithms like incremental gradient, single shuffle SGD, and random reshuffle SGD are popular due to their cache-mismatch efficiency and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\epsilon$ from $O \left( \frac{n}{\epsilon} \right)$ to $O \left( \frac{1}{\epsilon}\right)$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-koloskova24a,
  title = 	 {On Convergence of Incremental Gradient for Non-convex Smooth Functions},
  author =       {Koloskova, Anastasia and Doikov, Nikita and Stich, Sebastian U and Jaggi, Martin},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {25058--25086},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/koloskova24a/koloskova24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/koloskova24a.html},
  abstract = 	 {In machine learning and neural network optimization, algorithms like incremental gradient, single shuffle SGD, and random reshuffle SGD are popular due to their cache-mismatch efficiency and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\epsilon$ from $O \left( \frac{n}{\epsilon} \right)$ to $O \left( \frac{1}{\epsilon}\right)$.}
}

Endnote

%0 Conference Paper
%T On Convergence of Incremental Gradient for Non-convex Smooth Functions
%A Anastasia Koloskova
%A Nikita Doikov
%A Sebastian U Stich
%A Martin Jaggi
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-koloskova24a
%I PMLR
%P 25058--25086
%U https://proceedings.mlr.press/v235/koloskova24a.html
%V 235
%X In machine learning and neural network optimization, algorithms like incremental gradient, single shuffle SGD, and random reshuffle SGD are popular due to their cache-mismatch efficiency and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\epsilon$ from $O \left( \frac{n}{\epsilon} \right)$ to $O \left( \frac{1}{\epsilon}\right)$.

APA

Koloskova, A., Doikov, N., Stich, S.U. & Jaggi, M.. (2024). On Convergence of Incremental Gradient for Non-convex Smooth Functions. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:25058-25086 Available from https://proceedings.mlr.press/v235/koloskova24a.html.

On Convergence of Incremental Gradient for Non-convex Smooth Functions

Abstract

Cite this Paper

Related Material