A Novel Differentiable Mixed-Precision Quantization
 Search Framework for Alleviating the Matthew Effect
 and Improving Robustness

Hengyi Zhou; Hongyi He; Wanchen Liu; Yuhai Li; Haonan Zhang; Longjun Liu

A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness

Hengyi Zhou, Hongyi He, Wanchen Liu, Yuhai Li, Haonan Zhang, Longjun Liu

Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:1277-1292, 2023.

Abstract

Network quantization is an effective and widely-used model compression technique. Recently, several works apply differentiable neural architectural search (NAS) methods to mixed-precision quantization (MPQ) and achieve encouraging results. However, the nature of differentiable architecture search can lead to the Matthew Effect in the mixed-precision. The candidates with higher bit-widths would be trained maturely earlier while the candidates with lower bit-widths may never have the chance to express the desired function. To address this issue, we propose a novel mixed-precision quantization framework. The mixed-precision search is resolved as a distribution learning problem, which alleviates the Matthew effect and improves the generalization ability. Meanwhile, different from generic differentiable NAS methods, search space will grow rapidly as the depth of the network increases in the mixed-precision quantization search. This makes the supernet harder to train and the search process unstable. To this end, we add a skip connection with a gradually decreasing architecture weight between convolutional layers in the supernet to improve robustness. The skip connection will help the optimization of the search process and will not participate in the bit width competition. Extensive experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of the proposed methods. For example, when quantizing ResNet-50 on ImageNet, we achieve a state-of-the-art 156.10x Bitops compression rate while maintaining a 75.87

$%$ accuracy.

Cite this Paper

BibTeX


@InProceedings{pmlr-v189-zhou23a,
  title = 	 {A Novel Differentiable Mixed-Precision Quantization
 Search Framework for Alleviating the Matthew Effect
 and Improving Robustness},
  author =       {Zhou, Hengyi and He, Hongyi and Liu, Wanchen and Li, Yuhai and Zhang, Haonan and Liu, Longjun},
  booktitle = 	 {Proceedings of The 14th Asian Conference on Machine
 Learning},
  pages = 	 {1277--1292},
  year = 	 {2023},
  editor = 	 {Khan, Emtiyaz and Gonen, Mehmet},
  volume = 	 {189},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v189/zhou23a/zhou23a.pdf},
  url = 	 {https://proceedings.mlr.press/v189/zhou23a.html},
  abstract = 	 {Network quantization is an effective and widely-used
 model compression technique. Recently, several works
 apply differentiable neural architectural search
 (NAS) methods to mixed-precision quantization (MPQ)
 and achieve encouraging results. However, the nature
 of differentiable architecture search can lead to
 the Matthew Effect in the mixed-precision. The
 candidates with higher bit-widths would be trained
 maturely earlier while the candidates with lower
 bit-widths may never have the chance to express the
 desired function. To address this issue, we propose
 a novel mixed-precision quantization framework. The
 mixed-precision search is resolved as a distribution
 learning problem, which alleviates the Matthew
 effect and improves the generalization
 ability. Meanwhile, different from generic
 differentiable NAS methods, search space will grow
 rapidly as the depth of the network increases in the
 mixed-precision quantization search. This makes the
 supernet harder to train and the search process
 unstable. To this end, we add a skip connection with
 a gradually decreasing architecture weight between
 convolutional layers in the supernet to improve
 robustness. The skip connection will help the
 optimization of the search process and will not
 participate in the bit width competition. Extensive
 experiments on CIFAR-10 and ImageNet demonstrate the
 effectiveness of the proposed methods. For example,
 when quantizing ResNet-50 on ImageNet, we achieve a
 state-of-the-art 156.10x Bitops compression rate
 while maintaining a 75.87$%$ accuracy.}
}

Endnote

%0 Conference Paper
%T A Novel Differentiable Mixed-Precision Quantization
 Search Framework for Alleviating the Matthew Effect
 and Improving Robustness
%A Hengyi Zhou
%A Hongyi He
%A Wanchen Liu
%A Yuhai Li
%A Haonan Zhang
%A Longjun Liu
%B Proceedings of The 14th Asian Conference on Machine
 Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Emtiyaz Khan
%E Mehmet Gonen	
%F pmlr-v189-zhou23a
%I PMLR
%P 1277--1292
%U https://proceedings.mlr.press/v189/zhou23a.html
%V 189
%X Network quantization is an effective and widely-used
 model compression technique. Recently, several works
 apply differentiable neural architectural search
 (NAS) methods to mixed-precision quantization (MPQ)
 and achieve encouraging results. However, the nature
 of differentiable architecture search can lead to
 the Matthew Effect in the mixed-precision. The
 candidates with higher bit-widths would be trained
 maturely earlier while the candidates with lower
 bit-widths may never have the chance to express the
 desired function. To address this issue, we propose
 a novel mixed-precision quantization framework. The
 mixed-precision search is resolved as a distribution
 learning problem, which alleviates the Matthew
 effect and improves the generalization
 ability. Meanwhile, different from generic
 differentiable NAS methods, search space will grow
 rapidly as the depth of the network increases in the
 mixed-precision quantization search. This makes the
 supernet harder to train and the search process
 unstable. To this end, we add a skip connection with
 a gradually decreasing architecture weight between
 convolutional layers in the supernet to improve
 robustness. The skip connection will help the
 optimization of the search process and will not
 participate in the bit width competition. Extensive
 experiments on CIFAR-10 and ImageNet demonstrate the
 effectiveness of the proposed methods. For example,
 when quantizing ResNet-50 on ImageNet, we achieve a
 state-of-the-art 156.10x Bitops compression rate
 while maintaining a 75.87$%$ accuracy.

APA


Zhou, H., He, H., Liu, W., Li, Y., Zhang, H. & Liu, L.. (2023). A Novel Differentiable Mixed-Precision Quantization
 Search Framework for Alleviating the Matthew Effect
 and Improving Robustness. Proceedings of The 14th Asian Conference on Machine
 Learning, in Proceedings of Machine Learning Research 189:1277-1292 Available from https://proceedings.mlr.press/v189/zhou23a.html.

Related Material

Download PDF