Towards Accurate Post-training Network Quantization via Bit-Split and Stitching

Peisong Wang, Qiang Chen, Xiangyu He, Jian Cheng
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9847-9856, 2020.

Abstract

Network quantization is essential for deploying deep models to IoT devices due to its high efficiency. Most existing quantization approaches rely on the full training datasets and the time-consuming fine-tuning to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization due to the simple optimization strategy. In this paper, we propose a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, Bit-split can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wang20c, title = {Towards Accurate Post-training Network Quantization via Bit-Split and Stitching}, author = {Wang, Peisong and Chen, Qiang and He, Xiangyu and Cheng, Jian}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9847--9856}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wang20c/wang20c.pdf}, url = {https://proceedings.mlr.press/v119/wang20c.html}, abstract = {Network quantization is essential for deploying deep models to IoT devices due to its high efficiency. Most existing quantization approaches rely on the full training datasets and the time-consuming fine-tuning to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization due to the simple optimization strategy. In this paper, we propose a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, Bit-split can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.} }
Endnote
%0 Conference Paper %T Towards Accurate Post-training Network Quantization via Bit-Split and Stitching %A Peisong Wang %A Qiang Chen %A Xiangyu He %A Jian Cheng %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wang20c %I PMLR %P 9847--9856 %U https://proceedings.mlr.press/v119/wang20c.html %V 119 %X Network quantization is essential for deploying deep models to IoT devices due to its high efficiency. Most existing quantization approaches rely on the full training datasets and the time-consuming fine-tuning to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization due to the simple optimization strategy. In this paper, we propose a Bit-Split and Stitching framework (Bit-split) for lower-bit post-training quantization with minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, Bit-split can achieve near-original model performance even when quantizing FP32 models to INT3 without fine-tuning.
APA
Wang, P., Chen, Q., He, X. & Cheng, J.. (2020). Towards Accurate Post-training Network Quantization via Bit-Split and Stitching. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9847-9856 Available from https://proceedings.mlr.press/v119/wang20c.html.

Related Material