A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness

Hengyi Zhou, Hongyi He, Wanchen Liu, Yuhai Li, Haonan Zhang, Longjun Liu
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:1277-1292, 2023.

Abstract

Network quantization is an effective and widely-used model compression technique. Recently, several works apply differentiable neural architectural search (NAS) methods to mixed-precision quantization (MPQ) and achieve encouraging results. However, the nature of differentiable architecture search can lead to the Matthew Effect in the mixed-precision. The candidates with higher bit-widths would be trained maturely earlier while the candidates with lower bit-widths may never have the chance to express the desired function. To address this issue, we propose a novel mixed-precision quantization framework. The mixed-precision search is resolved as a distribution learning problem, which alleviates the Matthew effect and improves the generalization ability. Meanwhile, different from generic differentiable NAS methods, search space will grow rapidly as the depth of the network increases in the mixed-precision quantization search. This makes the supernet harder to train and the search process unstable. To this end, we add a skip connection with a gradually decreasing architecture weight between convolutional layers in the supernet to improve robustness. The skip connection will help the optimization of the search process and will not participate in the bit width competition. Extensive experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of the proposed methods. For example, when quantizing ResNet-50 on ImageNet, we achieve a state-of-the-art 156.10x Bitops compression rate while maintaining a 75.87$%$ accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-zhou23a, title = {A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness}, author = {Zhou, Hengyi and He, Hongyi and Liu, Wanchen and Li, Yuhai and Zhang, Haonan and Liu, Longjun}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {1277--1292}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/zhou23a/zhou23a.pdf}, url = {https://proceedings.mlr.press/v189/zhou23a.html}, abstract = {Network quantization is an effective and widely-used model compression technique. Recently, several works apply differentiable neural architectural search (NAS) methods to mixed-precision quantization (MPQ) and achieve encouraging results. However, the nature of differentiable architecture search can lead to the Matthew Effect in the mixed-precision. The candidates with higher bit-widths would be trained maturely earlier while the candidates with lower bit-widths may never have the chance to express the desired function. To address this issue, we propose a novel mixed-precision quantization framework. The mixed-precision search is resolved as a distribution learning problem, which alleviates the Matthew effect and improves the generalization ability. Meanwhile, different from generic differentiable NAS methods, search space will grow rapidly as the depth of the network increases in the mixed-precision quantization search. This makes the supernet harder to train and the search process unstable. To this end, we add a skip connection with a gradually decreasing architecture weight between convolutional layers in the supernet to improve robustness. The skip connection will help the optimization of the search process and will not participate in the bit width competition. Extensive experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of the proposed methods. For example, when quantizing ResNet-50 on ImageNet, we achieve a state-of-the-art 156.10x Bitops compression rate while maintaining a 75.87$%$ accuracy.} }
Endnote
%0 Conference Paper %T A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness %A Hengyi Zhou %A Hongyi He %A Wanchen Liu %A Yuhai Li %A Haonan Zhang %A Longjun Liu %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-zhou23a %I PMLR %P 1277--1292 %U https://proceedings.mlr.press/v189/zhou23a.html %V 189 %X Network quantization is an effective and widely-used model compression technique. Recently, several works apply differentiable neural architectural search (NAS) methods to mixed-precision quantization (MPQ) and achieve encouraging results. However, the nature of differentiable architecture search can lead to the Matthew Effect in the mixed-precision. The candidates with higher bit-widths would be trained maturely earlier while the candidates with lower bit-widths may never have the chance to express the desired function. To address this issue, we propose a novel mixed-precision quantization framework. The mixed-precision search is resolved as a distribution learning problem, which alleviates the Matthew effect and improves the generalization ability. Meanwhile, different from generic differentiable NAS methods, search space will grow rapidly as the depth of the network increases in the mixed-precision quantization search. This makes the supernet harder to train and the search process unstable. To this end, we add a skip connection with a gradually decreasing architecture weight between convolutional layers in the supernet to improve robustness. The skip connection will help the optimization of the search process and will not participate in the bit width competition. Extensive experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of the proposed methods. For example, when quantizing ResNet-50 on ImageNet, we achieve a state-of-the-art 156.10x Bitops compression rate while maintaining a 75.87$%$ accuracy.
APA
Zhou, H., He, H., Liu, W., Li, Y., Zhang, H. & Liu, L.. (2023). A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:1277-1292 Available from https://proceedings.mlr.press/v189/zhou23a.html.

Related Material