Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration

Zhengyang Zhuge, Peisong Wang, Xingting Yao, Jian Cheng
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:62768-62778, 2024.

Abstract

Nowadays Spiking Transformers have exhibited remarkable performance close to Artificial Neural Networks (ANNs), while enjoying the inherent energy-efficiency of Spiking Neural Networks (SNNs). However, training Spiking Transformers on GPUs is considerably more time-consuming compared to the ANN counterparts, despite the energy-efficient inference through neuromorphic computation. In this paper, we investigate the token sparsification technique for efficient training of Spiking Transformer and find conventional methods suffer from noticeable performance degradation. We analyze the issue and propose our Sparsification with Timestep-wise Anchor Token and dual Alignments (STATA). Timestep-wise Anchor Token enables precise identification of important tokens across timesteps based on standardized criteria. Additionally, dual Alignments incorporate both Intra and Inter Alignment of the attention maps, fostering the learning of inferior attention. Extensive experiments show the effectiveness of STATA thoroughly, which demonstrates up to $\sim$1.53$\times$ training speedup and $\sim$48% energy reduction with comparable performance on various datasets and architectures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhuge24b, title = {Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration}, author = {Zhuge, Zhengyang and Wang, Peisong and Yao, Xingting and Cheng, Jian}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {62768--62778}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhuge24b/zhuge24b.pdf}, url = {https://proceedings.mlr.press/v235/zhuge24b.html}, abstract = {Nowadays Spiking Transformers have exhibited remarkable performance close to Artificial Neural Networks (ANNs), while enjoying the inherent energy-efficiency of Spiking Neural Networks (SNNs). However, training Spiking Transformers on GPUs is considerably more time-consuming compared to the ANN counterparts, despite the energy-efficient inference through neuromorphic computation. In this paper, we investigate the token sparsification technique for efficient training of Spiking Transformer and find conventional methods suffer from noticeable performance degradation. We analyze the issue and propose our Sparsification with Timestep-wise Anchor Token and dual Alignments (STATA). Timestep-wise Anchor Token enables precise identification of important tokens across timesteps based on standardized criteria. Additionally, dual Alignments incorporate both Intra and Inter Alignment of the attention maps, fostering the learning of inferior attention. Extensive experiments show the effectiveness of STATA thoroughly, which demonstrates up to $\sim$1.53$\times$ training speedup and $\sim$48% energy reduction with comparable performance on various datasets and architectures.} }
Endnote
%0 Conference Paper %T Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration %A Zhengyang Zhuge %A Peisong Wang %A Xingting Yao %A Jian Cheng %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhuge24b %I PMLR %P 62768--62778 %U https://proceedings.mlr.press/v235/zhuge24b.html %V 235 %X Nowadays Spiking Transformers have exhibited remarkable performance close to Artificial Neural Networks (ANNs), while enjoying the inherent energy-efficiency of Spiking Neural Networks (SNNs). However, training Spiking Transformers on GPUs is considerably more time-consuming compared to the ANN counterparts, despite the energy-efficient inference through neuromorphic computation. In this paper, we investigate the token sparsification technique for efficient training of Spiking Transformer and find conventional methods suffer from noticeable performance degradation. We analyze the issue and propose our Sparsification with Timestep-wise Anchor Token and dual Alignments (STATA). Timestep-wise Anchor Token enables precise identification of important tokens across timesteps based on standardized criteria. Additionally, dual Alignments incorporate both Intra and Inter Alignment of the attention maps, fostering the learning of inferior attention. Extensive experiments show the effectiveness of STATA thoroughly, which demonstrates up to $\sim$1.53$\times$ training speedup and $\sim$48% energy reduction with comparable performance on various datasets and architectures.
APA
Zhuge, Z., Wang, P., Yao, X. & Cheng, J.. (2024). Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:62768-62778 Available from https://proceedings.mlr.press/v235/zhuge24b.html.

Related Material