Reinforcement Learning of Phase Oscillators for Fast Adaptation to Moving Targets

Guilherme Maeda, Okan Koc, Jun Morimoto
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:630-640, 2018.

Abstract

Online movement generation in tasks involving real humanoid robots interacting with fast-moving targets is extremely difficult. This paper approaches this problem via imitation and reinforcement learning using phase variables. Imitation learning is used to acquire primitive trajectories of the demonstrator interacting with the target. The temporal progress of the robot is represented as a function of the target’s phase. Using a phase oscillator formulation, reinforcement learning optimizes a temporal policy such that the robot can quickly react to large/unexpected changes in the target movement. The phase representation decouples the temporal and spatial problems allowing the use of fast online solutions. The methodology is applicable in both cyclic and single-stroke movements. We applied the proposed method on a real bi-manual humanoid upper body with 14 degrees-of-freedom where the robot had to repeatedly push a ball hanging in front of it. In simulation, we show a human-robot interaction scenario where the robot changed its role from giver to receiver as a function of the interaction reward.

Cite this Paper


BibTeX
@InProceedings{pmlr-v87-maeda18a, title = {Reinforcement Learning of Phase Oscillators for Fast Adaptation to Moving Targets}, author = {Maeda, Guilherme and Koc, Okan and Morimoto, Jun}, booktitle = {Proceedings of The 2nd Conference on Robot Learning}, pages = {630--640}, year = {2018}, editor = {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun}, volume = {87}, series = {Proceedings of Machine Learning Research}, month = {29--31 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v87/maeda18a/maeda18a.pdf}, url = {https://proceedings.mlr.press/v87/maeda18a.html}, abstract = {Online movement generation in tasks involving real humanoid robots interacting with fast-moving targets is extremely difficult. This paper approaches this problem via imitation and reinforcement learning using phase variables. Imitation learning is used to acquire primitive trajectories of the demonstrator interacting with the target. The temporal progress of the robot is represented as a function of the target’s phase. Using a phase oscillator formulation, reinforcement learning optimizes a temporal policy such that the robot can quickly react to large/unexpected changes in the target movement. The phase representation decouples the temporal and spatial problems allowing the use of fast online solutions. The methodology is applicable in both cyclic and single-stroke movements. We applied the proposed method on a real bi-manual humanoid upper body with 14 degrees-of-freedom where the robot had to repeatedly push a ball hanging in front of it. In simulation, we show a human-robot interaction scenario where the robot changed its role from giver to receiver as a function of the interaction reward. } }
Endnote
%0 Conference Paper %T Reinforcement Learning of Phase Oscillators for Fast Adaptation to Moving Targets %A Guilherme Maeda %A Okan Koc %A Jun Morimoto %B Proceedings of The 2nd Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2018 %E Aude Billard %E Anca Dragan %E Jan Peters %E Jun Morimoto %F pmlr-v87-maeda18a %I PMLR %P 630--640 %U https://proceedings.mlr.press/v87/maeda18a.html %V 87 %X Online movement generation in tasks involving real humanoid robots interacting with fast-moving targets is extremely difficult. This paper approaches this problem via imitation and reinforcement learning using phase variables. Imitation learning is used to acquire primitive trajectories of the demonstrator interacting with the target. The temporal progress of the robot is represented as a function of the target’s phase. Using a phase oscillator formulation, reinforcement learning optimizes a temporal policy such that the robot can quickly react to large/unexpected changes in the target movement. The phase representation decouples the temporal and spatial problems allowing the use of fast online solutions. The methodology is applicable in both cyclic and single-stroke movements. We applied the proposed method on a real bi-manual humanoid upper body with 14 degrees-of-freedom where the robot had to repeatedly push a ball hanging in front of it. In simulation, we show a human-robot interaction scenario where the robot changed its role from giver to receiver as a function of the interaction reward.
APA
Maeda, G., Koc, O. & Morimoto, J.. (2018). Reinforcement Learning of Phase Oscillators for Fast Adaptation to Moving Targets. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:630-640 Available from https://proceedings.mlr.press/v87/maeda18a.html.

Related Material