PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

Cheng Qian, Julen Urain, Kevin Zakka, Jan Peters
Proceedings of The 8th Conference on Robot Learning, PMLR 270:1194-1215, 2025.

Abstract

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 57% F1 score on unseen songs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-qian25a, title = {PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations}, author = {Qian, Cheng and Urain, Julen and Zakka, Kevin and Peters, Jan}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {1194--1215}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/qian25a/qian25a.pdf}, url = {https://proceedings.mlr.press/v270/qian25a.html}, abstract = {In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 57% F1 score on unseen songs.} }
Endnote
%0 Conference Paper %T PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations %A Cheng Qian %A Julen Urain %A Kevin Zakka %A Jan Peters %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-qian25a %I PMLR %P 1194--1215 %U https://proceedings.mlr.press/v270/qian25a.html %V 270 %X In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 57% F1 score on unseen songs.
APA
Qian, C., Urain, J., Zakka, K. & Peters, J.. (2025). PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1194-1215 Available from https://proceedings.mlr.press/v270/qian25a.html.

Related Material