Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang; Zhanglu Yan; Weng-Fai Wong

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang, Zhanglu Yan, Weng-Fai Wong

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:58989-59004, 2025.

Abstract

For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a BitShifting-based PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at https://github.com/Kaiwen-Tang/Sorbet

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-tang25l,
  title = 	 {Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model},
  author =       {Tang, Kaiwen and Yan, Zhanglu and Wong, Weng-Fai},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {58989--59004},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/tang25l/tang25l.pdf},
  url = 	 {https://proceedings.mlr.press/v267/tang25l.html},
  abstract = 	 {For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a BitShifting-based PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at https://github.com/Kaiwen-Tang/Sorbet}
}

Endnote

%0 Conference Paper
%T Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
%A Kaiwen Tang
%A Zhanglu Yan
%A Weng-Fai Wong
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-tang25l
%I PMLR
%P 58989--59004
%U https://proceedings.mlr.press/v267/tang25l.html
%V 267
%X For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a BitShifting-based PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at https://github.com/Kaiwen-Tang/Sorbet

APA

Tang, K., Yan, Z. & Wong, W.. (2025). Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:58989-59004 Available from https://proceedings.mlr.press/v267/tang25l.html.

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Abstract

Cite this Paper

Related Material