Learning to Groove with Inverse Sequence Transformations

Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David Bamman
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2269-2279, 2019.

Abstract

We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Though seq2seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix, Vid2Vid) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and learning to invert them has real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-gillick19a, title = {Learning to Groove with Inverse Sequence Transformations}, author = {Gillick, Jon and Roberts, Adam and Engel, Jesse and Eck, Douglas and Bamman, David}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2269--2279}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/gillick19a/gillick19a.pdf}, url = {https://proceedings.mlr.press/v97/gillick19a.html}, abstract = {We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Though seq2seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix, Vid2Vid) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and learning to invert them has real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).} }
Endnote
%0 Conference Paper %T Learning to Groove with Inverse Sequence Transformations %A Jon Gillick %A Adam Roberts %A Jesse Engel %A Douglas Eck %A David Bamman %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-gillick19a %I PMLR %P 2269--2279 %U https://proceedings.mlr.press/v97/gillick19a.html %V 97 %X We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Though seq2seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix, Vid2Vid) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and learning to invert them has real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).
APA
Gillick, J., Roberts, A., Engel, J., Eck, D. & Bamman, D.. (2019). Learning to Groove with Inverse Sequence Transformations. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2269-2279 Available from https://proceedings.mlr.press/v97/gillick19a.html.

Related Material