Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs

Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4087-4095, 2025.

Abstract

Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-compagnoni25a, title = {Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs}, author = {Compagnoni, Enea Monzio and Islamov, Rustem and Proske, Frank Norbert and Lucchi, Aurelien}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {4087--4095}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/compagnoni25a/compagnoni25a.pdf}, url = {https://proceedings.mlr.press/v258/compagnoni25a.html}, abstract = {Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.} }
Endnote
%0 Conference Paper %T Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs %A Enea Monzio Compagnoni %A Rustem Islamov %A Frank Norbert Proske %A Aurelien Lucchi %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-compagnoni25a %I PMLR %P 4087--4095 %U https://proceedings.mlr.press/v258/compagnoni25a.html %V 258 %X Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.
APA
Compagnoni, E.M., Islamov, R., Proske, F.N. & Lucchi, A.. (2025). Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4087-4095 Available from https://proceedings.mlr.press/v258/compagnoni25a.html.

Related Material