Improving Stochastic Cubic Newton with Momentum

El Mahdi Chayti, Nikita Doikov, Martin Jaggi
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1441-1449, 2025.

Abstract

We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton’s method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-chayti25a, title = {Improving Stochastic Cubic Newton with Momentum}, author = {Chayti, El Mahdi and Doikov, Nikita and Jaggi, Martin}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1441--1449}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/chayti25a/chayti25a.pdf}, url = {https://proceedings.mlr.press/v258/chayti25a.html}, abstract = {We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton’s method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.} }
Endnote
%0 Conference Paper %T Improving Stochastic Cubic Newton with Momentum %A El Mahdi Chayti %A Nikita Doikov %A Martin Jaggi %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-chayti25a %I PMLR %P 1441--1449 %U https://proceedings.mlr.press/v258/chayti25a.html %V 258 %X We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton’s method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
APA
Chayti, E.M., Doikov, N. & Jaggi, M.. (2025). Improving Stochastic Cubic Newton with Momentum. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1441-1449 Available from https://proceedings.mlr.press/v258/chayti25a.html.

Related Material