Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Charbel Sakr, Steve Dai, Rangha Venkatesan, Brian Zimmer, William Dally, Brucek Khailany
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:19123-19138, 2022.

Abstract

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-sakr22a, title = {Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training}, author = {Sakr, Charbel and Dai, Steve and Venkatesan, Rangha and Zimmer, Brian and Dally, William and Khailany, Brucek}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {19123--19138}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/sakr22a/sakr22a.pdf}, url = {https://proceedings.mlr.press/v162/sakr22a.html}, abstract = {Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.} }
Endnote
%0 Conference Paper %T Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training %A Charbel Sakr %A Steve Dai %A Rangha Venkatesan %A Brian Zimmer %A William Dally %A Brucek Khailany %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-sakr22a %I PMLR %P 19123--19138 %U https://proceedings.mlr.press/v162/sakr22a.html %V 162 %X Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.
APA
Sakr, C., Dai, S., Venkatesan, R., Zimmer, B., Dally, W. & Khailany, B.. (2022). Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:19123-19138 Available from https://proceedings.mlr.press/v162/sakr22a.html.

Related Material