[edit]
Encoder-Only Transformers for Melodic Harmonization: Representation Emergence and Inference Strategies
Proceedings of Machine Learning Research, PMLR 303:1-11, 2026.
Abstract
This paper addresses the problem of melodic harmonization—the automatic generation of harmonic accompaniments that complement a given melody—using non-autoregressive, encoder-only transformer models operating on a synchronized melody-harmony time grid. The proposed framework allows flexible conditioning, such as fixing chords at specific positions, while maintaining high generative quality. Comparative experiments show that single-encoder models outperform dual-encoder architectures despite using fewer parameters. Interestingly, harmony-related attention patterns emerge even when harmony tokens remain fully masked during training, and models using only cross-attention achieve comparable results, suggesting implicit modeling of harmony-harmony relations. Different inference unmasking strategies further reveal notable effects on harmonic structure and coherence.