Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Zhaoyang Zhang, Wenqi Shao, Jinwei Gu, Xiaogang Wang, Ping Luo
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12546-12556, 2021.

Abstract

Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune these values, we present a fully differentiable approach to learn all of them, named Differentiable Dynamic Quantization (DDQ), which has several benefits. (1) DDQ is able to quantize challenging lightweight architectures like MobileNets, where different layers prefer different quantization parameters. (2) DDQ is hardware-friendly and can be easily implemented using low-precision matrix-vector multiplication, making it capable in many hardware such as ARM. (3) Extensive experiments show that DDQ outperforms prior arts on many networks and benchmarks, especially when models are already efficient and compact. e.g., DDQ is the first approach that achieves lossless 4-bit quantization for MobileNetV2 on ImageNet.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zhang21r, title = {Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution}, author = {Zhang, Zhaoyang and Shao, Wenqi and Gu, Jinwei and Wang, Xiaogang and Luo, Ping}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12546--12556}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zhang21r/zhang21r.pdf}, url = {https://proceedings.mlr.press/v139/zhang21r.html}, abstract = {Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune these values, we present a fully differentiable approach to learn all of them, named Differentiable Dynamic Quantization (DDQ), which has several benefits. (1) DDQ is able to quantize challenging lightweight architectures like MobileNets, where different layers prefer different quantization parameters. (2) DDQ is hardware-friendly and can be easily implemented using low-precision matrix-vector multiplication, making it capable in many hardware such as ARM. (3) Extensive experiments show that DDQ outperforms prior arts on many networks and benchmarks, especially when models are already efficient and compact. e.g., DDQ is the first approach that achieves lossless 4-bit quantization for MobileNetV2 on ImageNet.} }
Endnote
%0 Conference Paper %T Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution %A Zhaoyang Zhang %A Wenqi Shao %A Jinwei Gu %A Xiaogang Wang %A Ping Luo %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zhang21r %I PMLR %P 12546--12556 %U https://proceedings.mlr.press/v139/zhang21r.html %V 139 %X Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune these values, we present a fully differentiable approach to learn all of them, named Differentiable Dynamic Quantization (DDQ), which has several benefits. (1) DDQ is able to quantize challenging lightweight architectures like MobileNets, where different layers prefer different quantization parameters. (2) DDQ is hardware-friendly and can be easily implemented using low-precision matrix-vector multiplication, making it capable in many hardware such as ARM. (3) Extensive experiments show that DDQ outperforms prior arts on many networks and benchmarks, especially when models are already efficient and compact. e.g., DDQ is the first approach that achieves lossless 4-bit quantization for MobileNetV2 on ImageNet.
APA
Zhang, Z., Shao, W., Gu, J., Wang, X. & Luo, P.. (2021). Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12546-12556 Available from https://proceedings.mlr.press/v139/zhang21r.html.

Related Material