[edit]
A Novel Differentiable Mixed-Precision Quantization Search Framework for Alleviating the Matthew Effect and Improving Robustness
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:1277-1292, 2023.
Abstract
Network quantization is an effective and widely-used
model compression technique. Recently, several works
apply differentiable neural architectural search
(NAS) methods to mixed-precision quantization (MPQ)
and achieve encouraging results. However, the nature
of differentiable architecture search can lead to
the Matthew Effect in the mixed-precision. The
candidates with higher bit-widths would be trained
maturely earlier while the candidates with lower
bit-widths may never have the chance to express the
desired function. To address this issue, we propose
a novel mixed-precision quantization framework. The
mixed-precision search is resolved as a distribution
learning problem, which alleviates the Matthew
effect and improves the generalization
ability. Meanwhile, different from generic
differentiable NAS methods, search space will grow
rapidly as the depth of the network increases in the
mixed-precision quantization search. This makes the
supernet harder to train and the search process
unstable. To this end, we add a skip connection with
a gradually decreasing architecture weight between
convolutional layers in the supernet to improve
robustness. The skip connection will help the
optimization of the search process and will not
participate in the bit width competition. Extensive
experiments on CIFAR-10 and ImageNet demonstrate the
effectiveness of the proposed methods. For example,
when quantizing ResNet-50 on ImageNet, we achieve a
state-of-the-art 156.10x Bitops compression rate
while maintaining a 75.87$%$ accuracy.