blksprs: A Triton Library for Block-Sparse Matrix Operations

Felix Schön, Hans Tompits
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:139-150, 2026.

Abstract

In this paper, we introduce blksprs, a Triton-based PyTorch library for block-sparse matrix operations designed for machine-learning approaches. In contrast to existing approaches, blksprs supports a significantly wider range of operations, including but not limited to matrix multiplication, softmax, gather, scatter, transposition, (interleaved) repeat, and more. Furthermore, it supports flexible sparsity specification for all input and output matrices of these operations. These features facilitate applications that would have previously been infeasible. We provide a formal specification and demonstrate that blksprs can consistently outperform standard PyTorch and existing Triton implementations. In a practical evaluation, we were able to reduce training time by up to 35% and memory consumption by up to 45% when employing blksprs for the training of a Transformer neural network, with minimal implementational overhead.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-schon26a, title = {blksprs: A Triton Library for Block-Sparse Matrix Operations}, author = {Sch\"{o}n, Felix and Tompits, Hans}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {139--150}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/schon26a/schon26a.pdf}, url = {https://proceedings.mlr.press/v318/schon26a.html}, abstract = {In this paper, we introduce blksprs, a Triton-based PyTorch library for block-sparse matrix operations designed for machine-learning approaches. In contrast to existing approaches, blksprs supports a significantly wider range of operations, including but not limited to matrix multiplication, softmax, gather, scatter, transposition, (interleaved) repeat, and more. Furthermore, it supports flexible sparsity specification for all input and output matrices of these operations. These features facilitate applications that would have previously been infeasible. We provide a formal specification and demonstrate that blksprs can consistently outperform standard PyTorch and existing Triton implementations. In a practical evaluation, we were able to reduce training time by up to 35% and memory consumption by up to 45% when employing blksprs for the training of a Transformer neural network, with minimal implementational overhead.} }
Endnote
%0 Conference Paper %T blksprs: A Triton Library for Block-Sparse Matrix Operations %A Felix Schön %A Hans Tompits %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-schon26a %I PMLR %P 139--150 %U https://proceedings.mlr.press/v318/schon26a.html %V 318 %X In this paper, we introduce blksprs, a Triton-based PyTorch library for block-sparse matrix operations designed for machine-learning approaches. In contrast to existing approaches, blksprs supports a significantly wider range of operations, including but not limited to matrix multiplication, softmax, gather, scatter, transposition, (interleaved) repeat, and more. Furthermore, it supports flexible sparsity specification for all input and output matrices of these operations. These features facilitate applications that would have previously been infeasible. We provide a formal specification and demonstrate that blksprs can consistently outperform standard PyTorch and existing Triton implementations. In a practical evaluation, we were able to reduce training time by up to 35% and memory consumption by up to 45% when employing blksprs for the training of a Transformer neural network, with minimal implementational overhead.
APA
Schön, F. & Tompits, H.. (2026). blksprs: A Triton Library for Block-Sparse Matrix Operations. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:139-150 Available from https://proceedings.mlr.press/v318/schon26a.html.

Related Material