Learning of Non-Parametric Control Policies with High-Dimensional State Features


Herke Van Hoof, Jan Peters, Gerhard Neumann ;
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:995-1003, 2015.


Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.

Related Material