A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers

Kengo Matsumoto, Tomoya Matsuda, Atsuki Inoue, Hiroshi Kawaguchi, Yasufumi Sakai
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:882-894, 2024.

Abstract

Reducing the memory usage and computational complexity of high-performance deep neural networks while minimizing degradation of accuracy is a key issue in implementing these models on edge devices. To address this issue, partial quantization methods have been proposed to partially reduce the weight parameters of neural network models. However, the accuracy of existing methods degrades rapidly with increasing compression ratio. Although retraining can compensate for this issue to some extent, it is computationally very expensive. In this study, we propose a mixed-precision quantization algorithm without retraining or degradation in accuracy. In the proposed method, first, the difference between values after and before quantization losses of each channel in the layers of the pretrained model is calculated for all channels. Next, the layers are divided into two groups called semilayers according to whether the loss difference is positive or negative. The priorities for quantization in the semilayers are determined based on the Kulback-Leibler divergence derived from the probability distribution of the softmax output after and before quantization. The same process is repeated as a mixed-precision quantization while gradually decreasing the bitwidth, for example, with 8-, 6-, and 4-bit quantizations, and so forth. The results of an experimental evaluation show that the proposed method successfully compressed a ResNet-18 model by 81.44%, a ResNet-34 model by 84.25%, and a ResNet-50 model by 80.39% on image classification tasks using the ImageNet dataset, and a ResNet-18 model by 80.56% on image classification tasks using the CIFAR-10 dataset, with no degradation of the inference accuracy of the pretrained models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-matsumoto24a, title = {A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers}, author = {Matsumoto, Kengo and Matsuda, Tomoya and Inoue, Atsuki and Kawaguchi, Hiroshi and Sakai, Yasufumi}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {882--894}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/matsumoto24a/matsumoto24a.pdf}, url = {https://proceedings.mlr.press/v222/matsumoto24a.html}, abstract = {Reducing the memory usage and computational complexity of high-performance deep neural networks while minimizing degradation of accuracy is a key issue in implementing these models on edge devices. To address this issue, partial quantization methods have been proposed to partially reduce the weight parameters of neural network models. However, the accuracy of existing methods degrades rapidly with increasing compression ratio. Although retraining can compensate for this issue to some extent, it is computationally very expensive. In this study, we propose a mixed-precision quantization algorithm without retraining or degradation in accuracy. In the proposed method, first, the difference between values after and before quantization losses of each channel in the layers of the pretrained model is calculated for all channels. Next, the layers are divided into two groups called semilayers according to whether the loss difference is positive or negative. The priorities for quantization in the semilayers are determined based on the Kulback-Leibler divergence derived from the probability distribution of the softmax output after and before quantization. The same process is repeated as a mixed-precision quantization while gradually decreasing the bitwidth, for example, with 8-, 6-, and 4-bit quantizations, and so forth. The results of an experimental evaluation show that the proposed method successfully compressed a ResNet-18 model by 81.44%, a ResNet-34 model by 84.25%, and a ResNet-50 model by 80.39% on image classification tasks using the ImageNet dataset, and a ResNet-18 model by 80.56% on image classification tasks using the CIFAR-10 dataset, with no degradation of the inference accuracy of the pretrained models.} }
Endnote
%0 Conference Paper %T A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers %A Kengo Matsumoto %A Tomoya Matsuda %A Atsuki Inoue %A Hiroshi Kawaguchi %A Yasufumi Sakai %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-matsumoto24a %I PMLR %P 882--894 %U https://proceedings.mlr.press/v222/matsumoto24a.html %V 222 %X Reducing the memory usage and computational complexity of high-performance deep neural networks while minimizing degradation of accuracy is a key issue in implementing these models on edge devices. To address this issue, partial quantization methods have been proposed to partially reduce the weight parameters of neural network models. However, the accuracy of existing methods degrades rapidly with increasing compression ratio. Although retraining can compensate for this issue to some extent, it is computationally very expensive. In this study, we propose a mixed-precision quantization algorithm without retraining or degradation in accuracy. In the proposed method, first, the difference between values after and before quantization losses of each channel in the layers of the pretrained model is calculated for all channels. Next, the layers are divided into two groups called semilayers according to whether the loss difference is positive or negative. The priorities for quantization in the semilayers are determined based on the Kulback-Leibler divergence derived from the probability distribution of the softmax output after and before quantization. The same process is repeated as a mixed-precision quantization while gradually decreasing the bitwidth, for example, with 8-, 6-, and 4-bit quantizations, and so forth. The results of an experimental evaluation show that the proposed method successfully compressed a ResNet-18 model by 81.44%, a ResNet-34 model by 84.25%, and a ResNet-50 model by 80.39% on image classification tasks using the ImageNet dataset, and a ResNet-18 model by 80.56% on image classification tasks using the CIFAR-10 dataset, with no degradation of the inference accuracy of the pretrained models.
APA
Matsumoto, K., Matsuda, T., Inoue, A., Kawaguchi, H. & Sakai, Y.. (2024). A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:882-894 Available from https://proceedings.mlr.press/v222/matsumoto24a.html.

Related Material