Low-Precision Stochastic Gradient Langevin Dynamics

Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:26624-26644, 2022.

Abstract

While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhang22ag, title = {Low-Precision Stochastic Gradient {L}angevin Dynamics}, author = {Zhang, Ruqi and Wilson, Andrew Gordon and De Sa, Christopher}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {26624--26644}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhang22ag/zhang22ag.pdf}, url = {https://proceedings.mlr.press/v162/zhang22ag.html}, abstract = {While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.} }
Endnote
%0 Conference Paper %T Low-Precision Stochastic Gradient Langevin Dynamics %A Ruqi Zhang %A Andrew Gordon Wilson %A Christopher De Sa %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhang22ag %I PMLR %P 26624--26644 %U https://proceedings.mlr.press/v162/zhang22ag.html %V 162 %X While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.
APA
Zhang, R., Wilson, A.G. & De Sa, C.. (2022). Low-Precision Stochastic Gradient Langevin Dynamics. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:26624-26644 Available from https://proceedings.mlr.press/v162/zhang22ag.html.

Related Material