Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33811-33825, 2024.

Abstract

Post-Training Quantization (PTQ) is a vital technique for network compression and acceleration, gaining prominence as model sizes increase. This paper addresses a critical challenge in PTQ: the severe impact of outliers on the accuracy of quantized transformer architectures. Specifically, we introduce the concept of ‘reconstruction granularity’ as a novel solution to this issue, which has been overlooked in previous works. Our work provides theoretical insights into the role of reconstruction granularity in mitigating the outlier problem in transformer models. This theoretical framework is supported by empirical analysis, demonstrating that varying reconstruction granularities significantly influence quantization performance. Our findings indicate that different architectural designs necessitate distinct optimal reconstruction granularities. For instance, the multi-stage Swin Transformer architecture benefits from finer granularity, a deviation from the trends observed in ViT and DeiT models. We further develop an algorithm for determining the optimal reconstruction granularity for various ViT models, achieving state-of-the-art (SOTA) performance in PTQ. For example, applying our method to $4$-bit quantization, the Swin-Base model achieves a Top-1 accuracy of $82.24%$ on the ImageNet classification task. This result surpasses the RepQ-ViT by $3.92%$ ($82.24%$ VS $78.32%$). Similarly, our approach elevates the ViT-Small to a Top-1 accuracy of $80.50%$, outperforming NoisyQuant by $3.64%$ ($80.50%$ VS $76.86%$). Codes are available in Supplementary Materials.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ma24f, title = {Outlier-aware Slicing for Post-Training Quantization in Vision Transformer}, author = {Ma, Yuexiao and Li, Huixia and Zheng, Xiawu and Ling, Feng and Xiao, Xuefeng and Wang, Rui and Wen, Shilei and Chao, Fei and Ji, Rongrong}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {33811--33825}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ma24f/ma24f.pdf}, url = {https://proceedings.mlr.press/v235/ma24f.html}, abstract = {Post-Training Quantization (PTQ) is a vital technique for network compression and acceleration, gaining prominence as model sizes increase. This paper addresses a critical challenge in PTQ: the severe impact of outliers on the accuracy of quantized transformer architectures. Specifically, we introduce the concept of ‘reconstruction granularity’ as a novel solution to this issue, which has been overlooked in previous works. Our work provides theoretical insights into the role of reconstruction granularity in mitigating the outlier problem in transformer models. This theoretical framework is supported by empirical analysis, demonstrating that varying reconstruction granularities significantly influence quantization performance. Our findings indicate that different architectural designs necessitate distinct optimal reconstruction granularities. For instance, the multi-stage Swin Transformer architecture benefits from finer granularity, a deviation from the trends observed in ViT and DeiT models. We further develop an algorithm for determining the optimal reconstruction granularity for various ViT models, achieving state-of-the-art (SOTA) performance in PTQ. For example, applying our method to $4$-bit quantization, the Swin-Base model achieves a Top-1 accuracy of $82.24%$ on the ImageNet classification task. This result surpasses the RepQ-ViT by $3.92%$ ($82.24%$ VS $78.32%$). Similarly, our approach elevates the ViT-Small to a Top-1 accuracy of $80.50%$, outperforming NoisyQuant by $3.64%$ ($80.50%$ VS $76.86%$). Codes are available in Supplementary Materials.} }
Endnote
%0 Conference Paper %T Outlier-aware Slicing for Post-Training Quantization in Vision Transformer %A Yuexiao Ma %A Huixia Li %A Xiawu Zheng %A Feng Ling %A Xuefeng Xiao %A Rui Wang %A Shilei Wen %A Fei Chao %A Rongrong Ji %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ma24f %I PMLR %P 33811--33825 %U https://proceedings.mlr.press/v235/ma24f.html %V 235 %X Post-Training Quantization (PTQ) is a vital technique for network compression and acceleration, gaining prominence as model sizes increase. This paper addresses a critical challenge in PTQ: the severe impact of outliers on the accuracy of quantized transformer architectures. Specifically, we introduce the concept of ‘reconstruction granularity’ as a novel solution to this issue, which has been overlooked in previous works. Our work provides theoretical insights into the role of reconstruction granularity in mitigating the outlier problem in transformer models. This theoretical framework is supported by empirical analysis, demonstrating that varying reconstruction granularities significantly influence quantization performance. Our findings indicate that different architectural designs necessitate distinct optimal reconstruction granularities. For instance, the multi-stage Swin Transformer architecture benefits from finer granularity, a deviation from the trends observed in ViT and DeiT models. We further develop an algorithm for determining the optimal reconstruction granularity for various ViT models, achieving state-of-the-art (SOTA) performance in PTQ. For example, applying our method to $4$-bit quantization, the Swin-Base model achieves a Top-1 accuracy of $82.24%$ on the ImageNet classification task. This result surpasses the RepQ-ViT by $3.92%$ ($82.24%$ VS $78.32%$). Similarly, our approach elevates the ViT-Small to a Top-1 accuracy of $80.50%$, outperforming NoisyQuant by $3.64%$ ($80.50%$ VS $76.86%$). Codes are available in Supplementary Materials.
APA
Ma, Y., Li, H., Zheng, X., Ling, F., Xiao, X., Wang, R., Wen, S., Chao, F. & Ji, R.. (2024). Outlier-aware Slicing for Post-Training Quantization in Vision Transformer. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33811-33825 Available from https://proceedings.mlr.press/v235/ma24f.html.

Related Material