On Efficient Computational Methods for Transformer-Based Symbolic Music Generation

Felix Schön, Hans Tompits
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:1204-1209, 2026.

Abstract

Although Transformer models have shown particular promise for symbolic music generation, their quadratic computational complexity with respect to sequence length presents significant challenges for longer musical pieces. In this paper, we describe the goals and progress of an ongoing dissertation addressing these challenges through three interconnected research directions, aiming at the development of (i) novel tokenisation strategies that significantly reduce sequence lengths while maintaining generation quality, (ii) efficient methods for incorporating arbitrary musical information into attention mechanisms through both additive and multiplicative approaches, yielding statistically significant improvements over strong baselines, and (iii) a hierarchical attention architecture that explicitly models the multi-level structure of music across beats, bars, and larger segments using specialised block-sparse attention patterns. Results achieved so far support our central hypothesis that domain-aware architectural choices, informed by music theory, can yield significant improvements over generic sequence-modelling approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-schon26d, title = {On Efficient Computational Methods for Transformer-Based Symbolic Music Generation}, author = {Sch\"{o}n, Felix and Tompits, Hans}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {1204--1209}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/schon26d/schon26d.pdf}, url = {https://proceedings.mlr.press/v318/schon26d.html}, abstract = {Although Transformer models have shown particular promise for symbolic music generation, their quadratic computational complexity with respect to sequence length presents significant challenges for longer musical pieces. In this paper, we describe the goals and progress of an ongoing dissertation addressing these challenges through three interconnected research directions, aiming at the development of (i) novel tokenisation strategies that significantly reduce sequence lengths while maintaining generation quality, (ii) efficient methods for incorporating arbitrary musical information into attention mechanisms through both additive and multiplicative approaches, yielding statistically significant improvements over strong baselines, and (iii) a hierarchical attention architecture that explicitly models the multi-level structure of music across beats, bars, and larger segments using specialised block-sparse attention patterns. Results achieved so far support our central hypothesis that domain-aware architectural choices, informed by music theory, can yield significant improvements over generic sequence-modelling approaches.} }
Endnote
%0 Conference Paper %T On Efficient Computational Methods for Transformer-Based Symbolic Music Generation %A Felix Schön %A Hans Tompits %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-schon26d %I PMLR %P 1204--1209 %U https://proceedings.mlr.press/v318/schon26d.html %V 318 %X Although Transformer models have shown particular promise for symbolic music generation, their quadratic computational complexity with respect to sequence length presents significant challenges for longer musical pieces. In this paper, we describe the goals and progress of an ongoing dissertation addressing these challenges through three interconnected research directions, aiming at the development of (i) novel tokenisation strategies that significantly reduce sequence lengths while maintaining generation quality, (ii) efficient methods for incorporating arbitrary musical information into attention mechanisms through both additive and multiplicative approaches, yielding statistically significant improvements over strong baselines, and (iii) a hierarchical attention architecture that explicitly models the multi-level structure of music across beats, bars, and larger segments using specialised block-sparse attention patterns. Results achieved so far support our central hypothesis that domain-aware architectural choices, informed by music theory, can yield significant improvements over generic sequence-modelling approaches.
APA
Schön, F. & Tompits, H.. (2026). On Efficient Computational Methods for Transformer-Based Symbolic Music Generation. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:1204-1209 Available from https://proceedings.mlr.press/v318/schon26d.html.

Related Material