Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3304-3314, 2020.

Abstract

Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ”quantization knobs” that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TinyScript always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-fu20c, title = {Don’t Waste Your Bits! {S}queeze Activations and Gradients for Deep Neural Networks via {T}iny{S}cript}, author = {Fu, Fangcheng and Hu, Yuzheng and He, Yihan and Jiang, Jiawei and Shao, Yingxia and Zhang, Ce and Cui, Bin}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3304--3314}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/fu20c/fu20c.pdf}, url = {https://proceedings.mlr.press/v119/fu20c.html}, abstract = {Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ”quantization knobs” that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TinyScript always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.} }
Endnote
%0 Conference Paper %T Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript %A Fangcheng Fu %A Yuzheng Hu %A Yihan He %A Jiawei Jiang %A Yingxia Shao %A Ce Zhang %A Bin Cui %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-fu20c %I PMLR %P 3304--3314 %U https://proceedings.mlr.press/v119/fu20c.html %V 119 %X Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ”quantization knobs” that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TinyScript always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.
APA
Fu, F., Hu, Y., He, Y., Jiang, J., Shao, Y., Zhang, C. & Cui, B.. (2020). Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3304-3314 Available from https://proceedings.mlr.press/v119/fu20c.html.

Related Material