SWALP : Stochastic Weight Averaging in Low Precision Training

Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Chris De Sa
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7015-7024, 2019.

Abstract

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-yang19d, title = {{SWALP} : Stochastic Weight Averaging in Low Precision Training}, author = {Yang, Guandao and Zhang, Tianyi and Kirichenko, Polina and Bai, Junwen and Wilson, Andrew Gordon and De Sa, Chris}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7015--7024}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/yang19d/yang19d.pdf}, url = {https://proceedings.mlr.press/v97/yang19d.html}, abstract = {Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.} }
Endnote
%0 Conference Paper %T SWALP : Stochastic Weight Averaging in Low Precision Training %A Guandao Yang %A Tianyi Zhang %A Polina Kirichenko %A Junwen Bai %A Andrew Gordon Wilson %A Chris De Sa %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-yang19d %I PMLR %P 7015--7024 %U https://proceedings.mlr.press/v97/yang19d.html %V 97 %X Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.
APA
Yang, G., Zhang, T., Kirichenko, P., Bai, J., Wilson, A.G. & De Sa, C.. (2019). SWALP : Stochastic Weight Averaging in Low Precision Training. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7015-7024 Available from https://proceedings.mlr.press/v97/yang19d.html.

Related Material