Learning of Non-Parametric Control Policies with High-Dimensional State Features

Herke Van Hoof, Jan Peters, Gerhard Neumann
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:995-1003, 2015.

Abstract

Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-vanhoof15, title = {{Learning of Non-Parametric Control Policies with High-Dimensional State Features}}, author = {Van Hoof, Herke and Peters, Jan and Neumann, Gerhard}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {995--1003}, year = {2015}, editor = {Lebanon, Guy and Vishwanathan, S. V. N.}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/vanhoof15.pdf}, url = {https://proceedings.mlr.press/v38/vanhoof15.html}, abstract = {Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.} }
Endnote
%0 Conference Paper %T Learning of Non-Parametric Control Policies with High-Dimensional State Features %A Herke Van Hoof %A Jan Peters %A Gerhard Neumann %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-vanhoof15 %I PMLR %P 995--1003 %U https://proceedings.mlr.press/v38/vanhoof15.html %V 38 %X Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.
RIS
TY - CPAPER TI - Learning of Non-Parametric Control Policies with High-Dimensional State Features AU - Herke Van Hoof AU - Jan Peters AU - Gerhard Neumann BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-vanhoof15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 995 EP - 1003 L1 - http://proceedings.mlr.press/v38/vanhoof15.pdf UR - https://proceedings.mlr.press/v38/vanhoof15.html AB - Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data. ER -
APA
Van Hoof, H., Peters, J. & Neumann, G.. (2015). Learning of Non-Parametric Control Policies with High-Dimensional State Features. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:995-1003 Available from https://proceedings.mlr.press/v38/vanhoof15.html.

Related Material