Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies

Maximos Kaliakatsos-Papakostas, Dimos Makris, Konstantinos Soiledis, Konstantinos-Theodoros Tsamis
Proceedings of Machine Learning Research, PMLR 303:1-11, 2026.

Abstract

This paper addresses the problem of melodic harmonization—the automatic generation of harmonic accompaniments that complement a given melody—using non-autoregressive, encoder-only transformer models operating on a synchronized melody-harmony time grid. The proposed framework allows flexible conditioning, such as fixing chords at specific positions, while maintaining high generative quality. Comparative experiments show that single-encoder models outperform dual-encoder architectures despite using fewer parameters. Interestingly, harmony-related attention patterns emerge even when harmony tokens remain fully masked during training, and models using only cross-attention achieve comparable results, suggesting implicit modeling of harmony-harmony relations. Different inference unmasking strategies further reveal notable effects on harmonic structure and coherence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v303-kaliakatsos-papakostas26a, title = {Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies}, author = {Kaliakatsos-Papakostas, Maximos and Makris, Dimos and Soiledis, Konstantinos and Tsamis, Konstantinos-Theodoros}, booktitle = {Proceedings of Machine Learning Research}, pages = {1--11}, year = {2026}, editor = {Herremans, Dorien and Bhandari, Keshav and Roy, Abhinaba and Colton, Simon and Barthet, Mathieu}, volume = {303}, series = {Proceedings of Machine Learning Research}, month = {26 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v303/main/assets/kaliakatsos-papakostas26a/kaliakatsos-papakostas26a.pdf}, url = {https://proceedings.mlr.press/v303/kaliakatsos-papakostas26a.html}, abstract = {This paper addresses the problem of melodic harmonization—the automatic generation of harmonic accompaniments that complement a given melody—using non-autoregressive, encoder-only transformer models operating on a synchronized melody-harmony time grid. The proposed framework allows flexible conditioning, such as fixing chords at specific positions, while maintaining high generative quality. Comparative experiments show that single-encoder models outperform dual-encoder architectures despite using fewer parameters. Interestingly, harmony-related attention patterns emerge even when harmony tokens remain fully masked during training, and models using only cross-attention achieve comparable results, suggesting implicit modeling of harmony-harmony relations. Different inference unmasking strategies further reveal notable effects on harmonic structure and coherence.} }
Endnote
%0 Conference Paper %T Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies %A Maximos Kaliakatsos-Papakostas %A Dimos Makris %A Konstantinos Soiledis %A Konstantinos-Theodoros Tsamis %B Proceedings of Machine Learning Research %C Proceedings of Machine Learning Research %D 2026 %E Dorien Herremans %E Keshav Bhandari %E Abhinaba Roy %E Simon Colton %E Mathieu Barthet %F pmlr-v303-kaliakatsos-papakostas26a %I PMLR %P 1--11 %U https://proceedings.mlr.press/v303/kaliakatsos-papakostas26a.html %V 303 %X This paper addresses the problem of melodic harmonization—the automatic generation of harmonic accompaniments that complement a given melody—using non-autoregressive, encoder-only transformer models operating on a synchronized melody-harmony time grid. The proposed framework allows flexible conditioning, such as fixing chords at specific positions, while maintaining high generative quality. Comparative experiments show that single-encoder models outperform dual-encoder architectures despite using fewer parameters. Interestingly, harmony-related attention patterns emerge even when harmony tokens remain fully masked during training, and models using only cross-attention achieve comparable results, suggesting implicit modeling of harmony-harmony relations. Different inference unmasking strategies further reveal notable effects on harmonic structure and coherence.
APA
Kaliakatsos-Papakostas, M., Makris, D., Soiledis, K. & Tsamis, K.. (2026). Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies. Proceedings of Machine Learning Research, in Proceedings of Machine Learning Research 303:1-11 Available from https://proceedings.mlr.press/v303/kaliakatsos-papakostas26a.html.

Related Material