SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo; Xinghao Chen; Yehui Tang; Yunhe Wang

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:16802-16812, 2024.

Abstract

Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains

$83.6%$ top-1 accuracy on ImageNet-1K with

$16.2$ ms latency, which is

$2.4$ ms less than that of Flatten-Swin with

$0.1%$ higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency. Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-guo24a,
  title = 	 {{SLAB}: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization},
  author =       {Guo, Jialong and Chen, Xinghao and Tang, Yehui and Wang, Yunhe},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {16802--16812},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/guo24a/guo24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/guo24a.html},
  abstract = 	 {Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains $83.6%$ top-1 accuracy on ImageNet-1K with $16.2$ms latency, which is $2.4$ms less than that of Flatten-Swin with $0.1%$ higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency. Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.}
}

Endnote

%0 Conference Paper
%T SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
%A Jialong Guo
%A Xinghao Chen
%A Yehui Tang
%A Yunhe Wang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-guo24a
%I PMLR
%P 16802--16812
%U https://proceedings.mlr.press/v235/guo24a.html
%V 235
%X Transformers have become foundational architectures for both natural language and computer vision tasks. However, the high computational cost makes it quite challenging to deploy on resource-constraint devices. This paper investigates the computational bottleneck modules of efficient transformer, i.e., normalization layers and attention modules. LayerNorm is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing LayerNorm with more efficient BatchNorm in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. Moreover, we propose a simplified linear attention (SLA) module that is simple yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, our SLAB-Swin obtains $83.6%$ top-1 accuracy on ImageNet-1K with $16.2$ms latency, which is $2.4$ms less than that of Flatten-Swin with $0.1%$ higher accuracy. We also evaluated our method for language modeling task and obtain comparable performance and lower latency. Codes are publicly available at https://github.com/xinghaochen/SLAB and https://github.com/mindspore-lab/models/tree/master/research/huawei-noah/SLAB.

APA


Guo, J., Chen, X., Tang, Y. & Wang, Y.. (2024). SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:16802-16812 Available from https://proceedings.mlr.press/v235/guo24a.html.

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Abstract

Cite this Paper

Related Material