Reinforcement Learning of Phase Oscillators for Fast Adaptation to Moving Targets
Proceedings of The 2nd Conference on Robot Learning, PMLR 87:630-640, 2018.
Online movement generation in tasks involving real humanoid robots interacting with fast-moving targets is extremely difficult. This paper approaches this problem via imitation and reinforcement learning using phase variables. Imitation learning is used to acquire primitive trajectories of the demonstrator interacting with the target. The temporal progress of the robot is represented as a function of the target’s phase. Using a phase oscillator formulation, reinforcement learning optimizes a temporal policy such that the robot can quickly react to large/unexpected changes in the target movement. The phase representation decouples the temporal and spatial problems allowing the use of fast online solutions. The methodology is applicable in both cyclic and single-stroke movements. We applied the proposed method on a real bi-manual humanoid upper body with 14 degrees-of-freedom where the robot had to repeatedly push a ball hanging in front of it. In simulation, we show a human-robot interaction scenario where the robot changed its role from giver to receiver as a function of the interaction reward.