Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness

Qi He, Peiran Yu, Ziyi Chen, Heng Huang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:22816-22835, 2025.

Abstract

Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-he25u, title = {Revisiting Convergence: Shuffling Complexity Beyond {L}ipschitz Smoothness}, author = {He, Qi and Yu, Peiran and Chen, Ziyi and Huang, Heng}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {22816--22835}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/he25u/he25u.pdf}, url = {https://proceedings.mlr.press/v267/he25u.html}, abstract = {Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.} }
Endnote
%0 Conference Paper %T Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness %A Qi He %A Peiran Yu %A Ziyi Chen %A Heng Huang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-he25u %I PMLR %P 22816--22835 %U https://proceedings.mlr.press/v267/he25u.html %V 267 %X Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.
APA
He, Q., Yu, P., Chen, Z. & Huang, H.. (2025). Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:22816-22835 Available from https://proceedings.mlr.press/v267/he25u.html.

Related Material