[edit]
blksprs: A Triton Library for Block-Sparse Matrix Operations
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:139-150, 2026.
Abstract
In this paper, we introduce blksprs, a Triton-based PyTorch library for block-sparse matrix operations designed for machine-learning approaches. In contrast to existing approaches, blksprs supports a significantly wider range of operations, including but not limited to matrix multiplication, softmax, gather, scatter, transposition, (interleaved) repeat, and more. Furthermore, it supports flexible sparsity specification for all input and output matrices of these operations. These features facilitate applications that would have previously been infeasible. We provide a formal specification and demonstrate that blksprs can consistently outperform standard PyTorch and existing Triton implementations. In a practical evaluation, we were able to reduce training time by up to 35% and memory consumption by up to 45% when employing blksprs for the training of a Transformer neural network, with minimal implementational overhead.