Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects

Ke Liang Xiao, Noah Marshall, Atish Agarwala, Elliot Paquette
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:68391-68439, 2025.

Abstract

In recent years, SignSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that SignSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of SignSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of SignSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xiao25c, title = {Exact risk curves of sign{SGD} in High-Dimensions: quantifying preconditioning and noise-compression effects}, author = {Xiao, Ke Liang and Marshall, Noah and Agarwala, Atish and Paquette, Elliot}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {68391--68439}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xiao25c/xiao25c.pdf}, url = {https://proceedings.mlr.press/v267/xiao25c.html}, abstract = {In recent years, SignSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that SignSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of SignSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of SignSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.} }
Endnote
%0 Conference Paper %T Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects %A Ke Liang Xiao %A Noah Marshall %A Atish Agarwala %A Elliot Paquette %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xiao25c %I PMLR %P 68391--68439 %U https://proceedings.mlr.press/v267/xiao25c.html %V 267 %X In recent years, SignSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that SignSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of SignSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of SignSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.
APA
Xiao, K.L., Marshall, N., Agarwala, A. & Paquette, E.. (2025). Exact risk curves of signSGD in High-Dimensions: quantifying preconditioning and noise-compression effects. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:68391-68439 Available from https://proceedings.mlr.press/v267/xiao25c.html.

Related Material