AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

Li Ding, Wen Fei, Yuyang Huang, Shuangrui Ding, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10960-10977, 2024.

Abstract

Low-bit integer training emerges as a promising approach to mitigate the heavy burden during network training by quantizing the weights, activations, and gradients. However, existing methods cannot well achieve mixed-precision quantization for low-bit training and are commonly limited to INT8 precision. In this paper, we propose a novel low-bit integer training framework that, for the first time, achieves adaptive mixed-precision allocation (AMPA) for weights, activations, and gradients, and pushes the boundaries to a precision level below INT8. We develop a novel magnitude-based sensitivity measurement with regard to the quantization losses of weight, activation, and gradient quantization and the average gradient magnitudes, which is demonstrated as an upper bound of quantization influence in theory. We further design a layer-wise precision update strategy under observations on the quantization losses and their effects on model performance in low-bit training. Extensive experiments on different backbones and datasets show that, compared to INT8 quantization, the proposed method can achieve more than 38% BitOPs reduction with a tolerable loss below 2% in image classification, image segmentation, and language modeling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ding24b, title = {{AMPA}: Adaptive Mixed Precision Allocation for Low-Bit Integer Training}, author = {Ding, Li and Fei, Wen and Huang, Yuyang and Ding, Shuangrui and Dai, Wenrui and Li, Chenglin and Zou, Junni and Xiong, Hongkai}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {10960--10977}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24b/ding24b.pdf}, url = {https://proceedings.mlr.press/v235/ding24b.html}, abstract = {Low-bit integer training emerges as a promising approach to mitigate the heavy burden during network training by quantizing the weights, activations, and gradients. However, existing methods cannot well achieve mixed-precision quantization for low-bit training and are commonly limited to INT8 precision. In this paper, we propose a novel low-bit integer training framework that, for the first time, achieves adaptive mixed-precision allocation (AMPA) for weights, activations, and gradients, and pushes the boundaries to a precision level below INT8. We develop a novel magnitude-based sensitivity measurement with regard to the quantization losses of weight, activation, and gradient quantization and the average gradient magnitudes, which is demonstrated as an upper bound of quantization influence in theory. We further design a layer-wise precision update strategy under observations on the quantization losses and their effects on model performance in low-bit training. Extensive experiments on different backbones and datasets show that, compared to INT8 quantization, the proposed method can achieve more than 38% BitOPs reduction with a tolerable loss below 2% in image classification, image segmentation, and language modeling.} }
Endnote
%0 Conference Paper %T AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training %A Li Ding %A Wen Fei %A Yuyang Huang %A Shuangrui Ding %A Wenrui Dai %A Chenglin Li %A Junni Zou %A Hongkai Xiong %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ding24b %I PMLR %P 10960--10977 %U https://proceedings.mlr.press/v235/ding24b.html %V 235 %X Low-bit integer training emerges as a promising approach to mitigate the heavy burden during network training by quantizing the weights, activations, and gradients. However, existing methods cannot well achieve mixed-precision quantization for low-bit training and are commonly limited to INT8 precision. In this paper, we propose a novel low-bit integer training framework that, for the first time, achieves adaptive mixed-precision allocation (AMPA) for weights, activations, and gradients, and pushes the boundaries to a precision level below INT8. We develop a novel magnitude-based sensitivity measurement with regard to the quantization losses of weight, activation, and gradient quantization and the average gradient magnitudes, which is demonstrated as an upper bound of quantization influence in theory. We further design a layer-wise precision update strategy under observations on the quantization losses and their effects on model performance in low-bit training. Extensive experiments on different backbones and datasets show that, compared to INT8 quantization, the proposed method can achieve more than 38% BitOPs reduction with a tolerable loss below 2% in image classification, image segmentation, and language modeling.
APA
Ding, L., Fei, W., Huang, Y., Ding, S., Dai, W., Li, C., Zou, J. & Xiong, H.. (2024). AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10960-10977 Available from https://proceedings.mlr.press/v235/ding24b.html.

Related Material